Skip to content

docs(sqs): proposals for Phase 3.C (throttling) + 3.D (split-queue FIFO)#664

Merged
bootjp merged 47 commits intomainfrom
docs/sqs-phase3-proposals
Apr 28, 2026
Merged

docs(sqs): proposals for Phase 3.C (throttling) + 3.D (split-queue FIFO)#664
bootjp merged 47 commits intomainfrom
docs/sqs-phase3-proposals

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 26, 2026

Summary

Docs-only PR. Two design proposals for the remaining Phase 3 SQS items per docs/design/2026_04_24_proposed_sqs_compatible_adapter.md Section 14 (Phase 3 bullets). Both items were explicitly called out as needing separate design docs before any implementation work; this PR lands those proposals so the implementation PRs have a reviewed architecture to build on.

3.C — Per-queue throttling and tenant fairness (proposal)

Per-queue token-bucket throttling configured on queue meta (no separate keyspace), evaluated at the SQS adapter layer on the leader (no Raft per request), surfaced as the AWS Throttling error envelope so SDK retry/backoff just works.

Key decisions:

  • Default-off. Existing queues are unaffected; operators opt in per queue via SetQueueAttributes.
  • Per-action buckets (Send / Receive / Default) so a slow consumer cannot pin the producer.
  • Per-leader buckets, no replication. Worst case on failover: one extra burst on the new leader. Acceptable per AWS-equivalent behaviour at region failover boundaries; replicating would cost a Raft commit per SendMessage.
  • Batch verbs charge by entry count, not call count, with all-or-nothing rejection (matches AWS).
  • Admin-only configuration plane. Standard SQS clients see InvalidAttributeName on the Throttle* attributes (matches AWS behaviour for unknown attributes); the data-plane enforcement runs for everyone.

3.D — Split-queue FIFO (high-throughput FIFO) (proposal)

Per-MessageGroupId hash partitioning across multiple Raft groups, mirroring AWS High Throughput FIFO. Within-group ordering preserved; across-group throughput scales with the partition count.

Key decisions:

  • Existing single-partition FIFO queues stay byte-identical (PartitionCount = 0 path is the legacy layout; no migration runs implicitly).
  • Power-of-two partition counts only (1, 2, 4, 8, 16, 32) so the routing step is hash & (N-1) and future offline rebuilds stay tractable.
  • Partition count is immutable after first SendMessage. Live re-partitioning would break ordering for in-flight messages of every group whose hash bucket changed; out of scope.
  • Multi-PR rollout plan with an explicit "gate of no return" called out at PR 5 (the data-plane PR). PRs 1–4 are reversible no-ops on data layout; once a partitioned FIFO holds real data, rollback means draining and recreating the queue.
  • FNV-1a hash (deterministic across processes / Go versions / architectures). Risk of attacker-controlled MessageGroupId pinning all traffic to one partition is documented and accepted (the feature is for cooperative operators).

Test plan

Self-review

This is a docs-only PR; the 5-lens self-review collapses to:

  1. Data loss / Concurrency / Performance / Consistency — N/A, no code touched.
  2. Test coverage — N/A, no code touched. The proposals themselves include a Testing Strategy section (§6 / §9) so the implementation PRs have explicit acceptance criteria.

Stacking

This PR is independent of #650, #659, and #662. Branched from current main. Merge whenever ready — landing the proposal docs early lets reviewers comment on the architecture before the implementation PRs go up.

Summary by CodeRabbit

  • Documentation
    • Added detailed designs for optional per-queue token-bucket throttling (rules, batch semantics, AWS-shaped Throttling errors with Retry-After, metrics, rollout default-off) and for HT‑FIFO “split-queue” partitioning (routing, immutability, gating/rollout).
  • New Features
    • Optional queue-level send/receive throttling with batch-aware charging and Retry-After; HT‑FIFO partitioning with deterministic partition routing and immutable partition attributes.
  • Tests
    • Extensive unit and integration tests for throttling, HT‑FIFO partitioning, attribute validation, immutability, idempotency, and cache invalidation.

Two design proposals for the remaining Phase 3 SQS items per
docs/design/2026_04_24_proposed_sqs_compatible_adapter.md (Section
14, Phase 3 bullets). Both are explicitly called out as needing
separate design docs before implementation; this lands them so
implementation work has a reviewed architecture to start from.

3.C — per-queue token-bucket throttling, configured on queue meta,
evaluated at the SQS adapter layer on the leader. Default-off,
AWS-shape Throttling error envelope, per-leader buckets (no Raft
per request). Proposal covers: bucket model, charging per verb /
batch entry, configuration validation rules, AWS-compatibility
posture (admin-only attribute, hidden from standard principals),
multi-shard correctness, observability, alternatives, open
questions.

3.D — high-throughput FIFO via per-MessageGroupId hash partitioning
across multiple Raft groups. Multi-PR rollout plan with an explicit
"gate of no return" called out at the data-plane PR. Existing
single-partition FIFO queues stay byte-identical (PartitionCount=0
path). Proposal covers: data model and key encoding, routing,
SendMessage / ReceiveMessage / PurgeQueue / DeleteMessage flows,
shard router config syntax, reaper implications, migration story
(no live re-partitioning), failure modes, mixed-version cluster
behaviour, alternatives, open questions.

No code changes; review-only PR.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two design proposals (per-queue token-bucket throttling and HT‑FIFO split-queue partitioning) and implements runtime support: queue metadata parsing/validation, deterministic partition mapping, an in-memory per-queue token-bucket store with eviction/invalidation, handler-level charging for send/receive (including batch semantics), and extensive unit and integration tests.

Changes

Cohort / File(s) Summary
Design docs
docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md, docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md
Add two proposals: per-queue token-bucket throttling (config, validation, wire error envelopes, Retry-After, metrics, rollout) and HT‑FIFO split-queue design (partitioning model, keyspace, immutable attributes, routing, rollout/tests).
Throttle implementation & wiring
adapter/sqs_throttle.go, adapter/sqs.go
New in-memory token-bucket store keyed by queue+action+incarnation with refill/charge logic, reconciliation, eviction sweep loop, cache invalidation, Retry-After computation, and SQSServer wiring including Throttling error helper.
Throttle unit & integration tests
adapter/sqs_throttle_test.go, adapter/sqs_throttle_integration_test.go
Extensive unit tests for bucket math, concurrency, invalidation and validation rules; integration tests exercising throttle enforcement, batch semantics, cache invalidation, recreate semantics and wire-level error shapes.
Catalog / metadata parsing & validation
adapter/sqs_catalog.go, adapter/sqs_catalog_test.go
Persist and validate new Throttle*, HT‑FIFO attributes and Incarnation; refactor SetQueueAttributes validation, idempotency checks include throttle/HT‑FIFO fields, and selective bucket invalidation on attribute changes.
Message handlers & batch charging
adapter/sqs_messages.go, adapter/sqs_messages_batch.go
Refactor send into prepare/validate phases; add pre-dispatch charge for send/receive (including delete/change-visibility charging); batch handlers charge by entry count and abort early if throttled.
HT‑FIFO partitioning runtime
adapter/sqs_partitioning.go
New partitioning helpers: power-of-two validation, FNV-1a MessageGroupId→partition mapping, PartitionCount normalisation, immutability and dormancy gating, attribute helpers and renderers.
HT‑FIFO tests & integration
adapter/sqs_partitioning_test.go, adapter/sqs_partitioning_integration_test.go
Unit and wire-level integration tests covering partition mapping determinism/distribution, validation rules, dormancy gate, immutability and all-or-nothing SetQueueAttributes semantics.
Misc integration & idempotency tests
adapter/sqs_catalog_test.go, adapter/sqs_partitioning_integration_test.go, adapter/sqs_throttle_integration_test.go
Wire-level tests verifying throttle enforcement/Retry-After, bucket invalidation on metadata changes, batch semantics, and HT‑FIFO create/immutability/dormancy behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Server as SQSServer
  participant Catalog as QueueMetaStore
  participant Bucket as BucketStore
  participant Handler as QueueHandler

  Client->>Server: SendMessage / ReceiveMessage
  Server->>Catalog: Resolve QueueUrl → meta (throttle config, incarnation)
  Catalog-->>Server: Queue meta (exists / not found)
  alt queue exists
    Server->>Bucket: charge(queue, action, count, incarnation)
    alt tokens available
      Bucket-->>Server: allowed
      Server->>Handler: Dispatch to queue handler
      Handler-->>Server: Handler result
      Server-->>Client: 200 OK / response
    else throttled
      Bucket-->>Server: denied + retryAfter
      Server-->>Client: 400 Throttling + Retry-After header
    end
  else queue missing
    Server-->>Client: QueueDoesNotExist error
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I count the tokens, nibble one by one,
Buckets drip refill beneath the rising sun,
Groups hop to partitions, messages glide,
A rabbit cheers the queues now split and tried,
Hop—throttles tuned, the inboxes run.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: it indicates this PR adds design proposals for two SQS features (Phase 3.C throttling and 3.D split-queue FIFO), which aligns with the extensive documentation changes across multiple design proposal files and supporting implementation plumbing.
Docstring Coverage ✅ Passed Docstring coverage is 88.29% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/sqs-phase3-proposals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two design documents for the SQS adapter: one for per-queue throttling and another for split-queue FIFO partitioning. The throttling proposal aims to ensure tenant fairness and prevent resource exhaustion, while the split-queue FIFO design targets high-throughput requirements by distributing queues across multiple Raft groups. Review feedback identifies a critical backward compatibility risk in the proposed keyspace schema that would break existing data access. Additionally, there are concerns regarding potential performance bottlenecks from coarse-grained locking in the throttling store and a design flaw in the partitioned ReceiveMessage logic that could lead to 'false empty' responses for clients.

Comment on lines +49 to +51
A partition is identified by the tuple `(queueName, partitionIndex)` where `partitionIndex ∈ [0, N)`. Every existing message-keyspace key gains the `partitionIndex` as a fixed-width segment immediately after the queue name. Concretely, today's `!sqs|msg|data|<queue>|<gen>|<msgID>` becomes `!sqs|msg|data|<queue>|<partition>|<gen>|<msgID>`. The `<partition>` segment is `appendU64(partitionIndex)` (fixed 8 bytes BE), so:

- Single-partition FIFOs (and all Standard queues) write `partitionIndex = 0` and read with the same prefix as today — the schema is byte-identical for `N=1`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposed key format !sqs|msg|data|<queue>|<partition>|<gen>|<msgID> is not backward compatible with the existing format. Inserting the 8-byte <partition> segment shifts the offsets for <gen> and <msgID>, meaning the new code will fail to find existing messages on disk. This introduces a significant operational risk and prevents rolling upgrades. Per repository rules, the design must detail mitigation strategies (e.g., a temporary 'bridge' or 'proxy' mode) and strategies for zero-downtime cutovers (e.g., dual-write proxy or logical replication) to avoid service interruption.

References
  1. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.
  2. When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

Comment on lines +78 to +79
mu sync.Mutex
buckets map[bucketKey]*tokenBucket // (queueName, action) -> bucket
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a single sync.Mutex to protect the buckets map will likely become a performance bottleneck, as this lock must be acquired on every SendMessage and ReceiveMessage call. Per the general rules, frequently accessed fields on hot paths should prefer more granular locking or atomic operations to improve performance.

Suggested change
mu sync.Mutex
buckets map[bucketKey]*tokenBucket // (queueName, action) -> bucket
buckets sync.Map // map[bucketKey]*tokenBucket
References
  1. For frequently accessed fields that require monotonic updates, prefer atomic operations or granular locking over mutexes to improve performance on hot paths.

Comment on lines +154 to +157
A consumer making one ReceiveMessage call sees results from **only the partitions whose leader is the local node**. For a deployment where every partition has its own leader, a single client's ReceiveMessage hits one partition's worth of messages per call. This matches AWS HT-FIFO behaviour: clients are expected to spread their consumer pool across partitions, and an idle partition does not block a busy one.

For deployments that co-locate partitions on one Raft group, all partitions' leaders are the same node and a single ReceiveMessage scans all partitions in sequence. The scan is bounded by `MaxNumberOfMessages` and the existing per-partition page limit — no separate budget.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Restricting ReceiveMessage to only scan partitions led by the local node is a significant departure from the SQS model. In a multi-node deployment, this can lead to "false empty" responses if a client hits a node that is not the leader for partitions containing messages. This forces clients to implement cluster-aware round-robin logic, which leaks internal sharding details and breaks the SQS abstraction.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 341b0deb14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +146 to +148
b. If this node is not the leader for that partition, skip (the
consumer's next call will land on a different node and that
leader will serve its partition).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Proxy non-local partitions during ReceiveMessage

Step 2b explicitly skips partitions whose leader is not the local node, which means a consumer pinned to one endpoint will only ever drain that node’s leader-owned partitions while messages on other partitions can backlog indefinitely. This turns ReceiveMessage into node-local visibility rather than queue-wide visibility and breaks normal single-endpoint client deployments unless they add explicit endpoint round-robin behavior.

Useful? React with 👍 / 👎.

| `adapter/sqs.go` | After `authorizeSQSRequest`, call `bucketStore.charge(queueName, action, count)`. On reject, write the `Throttling` envelope and return. |
| `adapter/sqs_throttle_test.go` (new) | Unit tests for bucket math (edge cases: idle drift, burst, partial refill, batch over-charge, default-off). ~300 lines. |
| `adapter/sqs_throttle_integration_test.go` (new) | End-to-end: configure a queue with low limits, send N messages back-to-back, confirm the (N+1)th gets `Throttling` with `Retry-After`. ~150 lines. |
| `monitoring/registry.go` | Two new counter vectors: `sqs_throttled_requests_total{queue, action}` and `sqs_throttle_tokens_remaining{queue, action}`. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Define tokens_remaining as a gauge metric

This section defines sqs_throttle_tokens_remaining as a counter, but token-bucket remaining capacity must go up and down over time. If implemented as a counter, the value can only increase and any dashboard based on “trending toward zero” becomes meaningless, masking real throttling pressure; this metric needs to be specified as a gauge.

Useful? React with 👍 / 👎.

Four issues raised by Codex + Gemini Code Assist on PR #664:

3.D split-queue FIFO doc:

1. Backward-compatibility risk in keyspace (Codex P1 + Gemini high).
   The original §3.1 said "every existing key gains a <partition>
   segment" but also "the schema is byte-identical for N=1" — those
   are contradictory; inserting bytes shifts every downstream offset
   and breaks readback of existing data on disk. Rewrote §3.1 to
   make the key shape conditional on whether the queue is
   partitioned: legacy / Standard queues use today's keys
   byte-identically, partitioned queues use a new prefix
   (!sqs|msg|data|p|<queue>|<partition>|...) with an explicit "p|"
   discriminator that cannot collide with the legacy prefix even
   when partition=0. Constructor sketch updated.

2. ReceiveMessage "false empty" on non-local-leader partitions
   (Codex P1 + Gemini medium). Original §4.2 had clients pinned to
   one node only seeing partitions whose leader is local — messages
   on other partitions backlog forever and clients hit empty replies
   that look like "queue is drained". Rewrote §4.2 so ReceiveMessage
   proxies the per-partition fanout to the right leader via the
   existing leader-proxy machinery (extended to take a partition
   argument). Added a partitionOrder rotation seeded by RequestId so
   successive calls don't bias toward partition 0. Co-located
   deployments still pay nothing; spread deployments pay one extra
   hop per non-local partition only until MaxNumberOfMessages are
   collected.

3.C per-queue throttling doc:

3. Single sync.Mutex on the bucket store would be a hot-path
   contention point (Gemini medium). Switched §3.1 to a sync.Map
   for the bucket lookup (read-mostly access pattern) plus a
   per-bucket sync.Mutex for charge / refill. Cross-queue traffic
   never serialises on the same lock. Documented the three-step
   charge operation explicitly so readers see the locking story.

4. sqs_throttle_tokens_remaining must be a gauge, not a counter
   (Codex P2). Token budgets go up and down; a counter would mask
   exactly the depletion operators need to see. Updated both §4.1
   (files-touched table) and §7 (operational dashboard) to call
   the metric a gauge. Counter is correct for the rejections total.

Docs-only PR; no code changes, no implementation gate.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all four review findings (Codex + Gemini converged):

3.D split-queue FIFO doc:

  • Keyspace backward-compat (CodexP1 + GeminiHigh): §3.1 rewritten — legacy queues use today's keys byte-identically, partitioned queues use a new !sqs|msg|data|p|... prefix with an explicit p| discriminator that cannot collide with legacy keys.
  • ReceiveMessage false-empty (CodexP1 + GeminiMedium): §4.2 rewritten — proxies non-local-leader partition scans via the existing leader-proxy (extended to take a partition argument) so a consumer pinned to one endpoint still sees every partition's messages. Added partitionOrder rotation to avoid head-of-line bias.

3.C per-queue throttling doc:

  • Hot-path lock contention (GeminiMedium): §3.1 switched to sync.Map lookup + per-bucket sync.Mutex for charge/refill. Cross-queue traffic never serialises on the same lock. Three-step charge operation documented explicitly.
  • Metric type (CodexP2): sqs_throttle_tokens_remaining is now correctly a gauge (token budgets go up and down). sqs_throttled_requests_total stays a counter.

Docs-only changes; no implementation gate. +64/-22 lines across the two files.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 1s —— View job


Review

  • Read updated design documents
  • Verify fix: keyspace backward-compat (3.D)
  • Verify fix: ReceiveMessage false-empty (3.D)
  • Verify fix: hot-path lock contention (3.C)
  • Verify fix: metric type correction (3.C)
  • Overall assessment

All four issues from the Codex/Gemini review have been correctly addressed. New issues found below, one of them medium-severity.


All four previous issues: resolved ✓

3.D — Keyspace backward-compat (Gemini high + Codex P1)
The p| discriminator approach cleanly separates the new partitioned keyspace from legacy keys. Legacy paths stay byte-identical; partitioned queues get !sqs|msg|data|p|<queue>|<partition>|…. The reaper enumerating both prefixes is a good forward-compatibility detail.

3.D — ReceiveMessage false-empty (Gemini medium + Codex P1)
§4.2 now proxies non-local-leader partitions via the existing proxyToLeader machinery (extended with a partition argument) and the partitionOrder rotation seeded by RequestId is a nice head-of-line bias fix. The X-Elastickv-Receive-Partition: <k> header approach is clean and consistent with the existing proxy pattern.

3.C — Hot-path lock contention (Gemini medium)
sync.Map for the bucket registry (lock-free reads) + per-bucket sync.Mutex for charge/refill is the right pattern. The three-step charge operation is well-documented. The race-on-insert via LoadOrStore is handled correctly (both racers agree on config).

3.C — Metric type (Codex P2)
sqs_throttle_tokens_remaining is correctly a gauge in both §4.1 (files-touched table) and §7 (operational section). Both sites updated.


New issues in the updated doc

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §8.1 — stale failure mode description

The fix to §4.2 eliminated the false-empty problem server-side (proxy fanout handles it), but §8.1 still says:

"Consumer never sees a partition's leader: a consumer that always lands on node A sees only the partitions whose leader is A … Mitigation: document that clients should round-robin endpoints."

This directly contradicts §4.2 — the "round-robin endpoints" mitigation is no longer correct or necessary for this failure mode, since the proxy now handles it transparently. An implementor reading §8.1 in isolation would build the wrong mental model and might even suppress the proxy logic thinking clients are expected to handle distribution themselves.

The failure mode to document now is the performance cost, not the correctness risk: proxy latency under spread deployments (one extra RTT per non-local partition), and the trade-off that a consumer asking for 1 message touches at most 1 remote leader while an "is queue empty?" poll costs N proxy round-trips. Fix this →

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.1 — sqsMsgDataKey variadic signature is a footgun

The proposed constructor:

func sqsMsgDataKey(queueName string, gen uint64, messageID string, partitions ...uint32) []byte {

Using a variadic for the partition discriminator has two issues as implementation guidance:

  1. sqsMsgDataKey(q, gen, id, p0, p1) silently ignores p1 — a caller could accidentally pass the wrong count and the compiler won't catch it.
  2. Variadic calls allocate even for the zero-argument case in some Go versions, which matters on the legacy hot path.

The doc could instead name two distinct helpers (legacyMsgDataKey / partitionedMsgDataKey) with the thin dispatcher staying internal. The variadic form is fine as a unified constructor for now but the implementation PR should be aware of these edge cases.

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.1 — p| discriminator safety constraint is implicit

The doc correctly states the discriminator prevents collision, but doesn't state why it's safe: valid SQS queue names are alphanumeric + hyphen + underscore and cannot contain |, so no queue name can begin with p|, making the prefix unambiguous. One sentence in §3.1 would nail this closed explicitly — the implementation PR's name validator needs to enforce the | prohibition on queue names (which is standard SQS behaviour, but worth calling out as load-bearing here).

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §8.5 — upgrade gate mechanism underspecified

"The catalog rejects CreateQueue with PartitionCount > 1 until every node in the cluster reports the new feature flag" is the gate-of-no-return safety mechanism, but the doc doesn't describe how a node reports its feature flag or how the catalog queries all live nodes. The cross-reference to "§3.3.2 admin-forwarding upgrade gate from the admin dashboard design" is opaque without a link. Since this is the safety rail against accidentally creating partitioned queues in a mixed-version cluster, the mechanism deserves a sentence or two (e.g. "each node's health endpoint advertises a capabilities set; the catalog reads all registered nodes from the distribution layer's node registry and gates on all of them reporting HTFIFO").

[Low] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.4 — Retry-After: 1 lies for slow-refill queues

The doc justifies the constant Retry-After: 1 as "at the configured refill rate, one second is enough for at least one fresh token" — but this is only true when refillRate >= 1 token/s. A queue configured with SendRefillPerSecond = 0.1 needs 10 seconds for the next token; returning Retry-After: 1 causes 9 unnecessary 400 retries before the client finally gets through. The §3.4 table comment should note this edge case and suggest max(1, ceil(1/refillRate)) seconds, or cap the minimum SendRefillPerSecond at 1 to keep the constant valid.


Summary

The four original issues are cleanly resolved. The §8.1 stale failure mode is the only finding I'd call blocking — it directly contradicts §4.2 and gives implementors the wrong mental model. The remaining three are low-severity clarifications that would improve the implementation PRs' accuracy.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fa02faa35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


## 8. Failure Modes and Edge Cases

1. **Consumer never sees a partition's leader**: a consumer that always lands on node A sees only the partitions whose leader is A. If partitions are spread evenly, this is `1/N` of the queue. Mitigation: AWS SDK consumers naturally distribute across endpoints; for elastickv, document that clients should round-robin endpoints when consuming a partitioned FIFO.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove node-local receive behavior from failure guidance

Fresh evidence: §8.1 explicitly says an endpoint-pinned consumer only sees 1/N of partitions, which contradicts §4.2’s required proxy fanout behavior. Keeping this guidance in the proposal makes the implementation ambiguous, and following it would regress ReceiveMessage to node-local visibility where messages on remote-leader partitions can backlog indefinitely for single-endpoint consumers.

Useful? React with 👍 / 👎.

Comment on lines +109 to +111
if meta.PartitionCount <= 1 {
return 0
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align partition routing with perQueue throughput mode

Section 3.2 says FifoThroughputLimit=perQueue should effectively collapse routing to one partition, but the partitionFor pseudocode only checks PartitionCount and will hash across all partitions whenever PartitionCount > 1. If this is implemented as written, perQueue mode cannot behave as documented and will still distribute groups across partitions.

Useful? React with 👍 / 👎.

| JSON | HTTP 400, body `{"__type":"Throttling","message":"Rate exceeded for queue '<name>' action '<action>'"}`, header `x-amzn-ErrorType: Throttling`, header `Retry-After: 1` |
| Query | HTTP 400, body `<ErrorResponse><Error><Type>Sender</Type><Code>Throttling</Code><Message>...</Message></Error><RequestId>...</RequestId></ErrorResponse>`, headers as above |

`Retry-After: 1` is the conservative default — at the configured refill rate, one second is enough for at least one fresh token. A future iteration could compute the precise wait from `(1 - currentTokens) / refillRate` but the constant is enough for SDK backoff logic.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute Retry-After from refill rate

This claim assumes every bucket refills at least one token per second, but the proposed validator allows any positive refill rate. With configurations like RefillPerSecond=0.1, a Retry-After: 1 response guarantees another immediate throttle, creating avoidable retry pressure and misleading backoff behavior; the wait should be derived from the bucket’s refill rate (or the config should enforce a minimum of 1).

Useful? React with 👍 / 👎.

Five issues raised by Claude on PR #664 (one medium, four low — all
docs-only edits to the partial doc trio):

3.D split-queue FIFO doc:

1. (Medium) §8.1 stale failure-mode description contradicted §4.2.
   Original §8.1 said "Consumer never sees a partition's leader →
   mitigation: round-robin endpoints" — but §4.2 was rewritten earlier
   in this PR to proxy non-local partitions server-side, eliminating
   the false-empty risk entirely. Implementors reading §8.1 in
   isolation would have built the wrong mental model and possibly
   suppressed the proxy logic. §8.1 now describes the actual remaining
   failure mode — proxy round-trip latency under spread deployments —
   and explains that single-partition / co-located deployments pay
   nothing.

2. (Low) §3.1 variadic sqsMsgDataKey was a footgun. The variadic
   form would silently accept extra args without compiler diagnosis.
   Replaced with two named constructors (legacyMsgDataKey /
   partitionedMsgDataKey) and an explicit dispatch at the call site
   based on meta.PartitionCount > 1.

3. (Low) §3.1 p| discriminator safety constraint was implicit.
   Added an explicit paragraph explaining why the prefix cannot
   collide with the legacy keyspace: queue names admit only
   [A-Za-z0-9_-] (+ optional .fifo suffix), the validator must
   continue to reject |, and base32 segments never start with the
   literal ASCII p. Future relaxations of the name validator have
   to revisit this prefix scheme.

4. (Low) §8.5 upgrade-gate mechanism was underspecified. Added the
   concrete protocol: each node's /sqs_health endpoint advertises
   capabilities: ["htfifo"] once this PR's code is in the binary;
   the catalog reads the live node set from the distribution
   layer's node registry, polls /sqs_health on each, and gates
   PartitionCount > 1 on every node reporting htfifo. Fail-closed
   on timeouts. References PR #644 as the prior art for the same
   pattern.

3.C per-queue throttling doc:

5. (Low) §3.4 Retry-After: 1 lied for slow-refill queues. A queue
   with SendRefillPerSecond = 0.1 needs 10 s for the next token;
   returning 1 burned 9 unnecessary client retries before the
   request would succeed. Replaced with a per-queue computation:
   ceil((1 - currentTokens) / refillRate), floored at 1 s to
   match Retry-After's integer-second granularity. Fast-refill
   queues (>= 1 RPS) preserve the existing Retry-After: 1
   behaviour.

Docs-only PR; no code changes, no implementation gate.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all five findings from your follow-up review (1 medium, 4 low):

3.D split-queue FIFO doc:

  • (medium) §8.1 stale failure-mode description rewritten — was contradicting §4.2's proxy fix. Now describes the actual remaining failure (proxy RTT cost under spread deployment) instead of the eliminated false-empty risk.
  • (low) §3.1 variadic sqsMsgDataKey replaced with two named constructors (legacyMsgDataKey / partitionedMsgDataKey) + explicit call-site dispatch — no silent argument loss.
  • (low) §3.1 p| discriminator safety made explicit: queue-name validator's | prohibition + base32-encoded segments never starting with literal p together guarantee no collision.
  • (low) §8.5 upgrade-gate mechanism now concrete: /sqs_health advertises capabilities: ["htfifo"]; catalog polls every registered node, fail-closed on timeout. References PR admin: AdminForward follower-side client + handler integration (P1, partial) #644 as prior art.

3.C per-queue throttling doc:

  • (low) §3.4 Retry-After now computed per-queue from ceil((1 - currentTokens) / refillRate) (floored at 1 s for HTTP integer-second granularity) instead of the constant 1. Slow-refill queues no longer cause busy-loops.

Docs-only changes; no implementation gate. +24/-13 lines net across the two files.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 5m 50s —— View job


Review

  • Read updated design documents
  • Verify fix: §8.1 stale failure-mode description (3.D) — medium
  • Verify fix: §3.1 named constructors replacing variadic (3.D) — low
  • Verify fix: §3.1 p| discriminator safety explicit (3.D) — low
  • Verify fix: §8.5 upgrade-gate mechanism concrete (3.D) — low
  • Verify fix: §3.4 Retry-After computed from refill rate (3.C) — low
  • Check for new issues
  • Overall assessment

All five issues from the last Claude review are correctly resolved. One pre-existing Codex P2 finding from the second-pass review was not in the trigger's scope but remains open; surfacing it here. Two low-severity findings are new.


All five previous issues: resolved ✓

3.D — §8.1 stale failure-mode description (medium)
The failure mode now reads "Proxy RTT under spread deployment" with the explicit note that §4.2 proxies them server-side so false-empty is gone. The N-proxy-round-trip pathological case (asking "is anything here?" against an empty queue) and the long-poll mitigation are both called out. No contradiction with §4.2.

3.D — §3.1 variadic sqsMsgDataKey (low)
Two distinct named constructors: legacyMsgDataKey(queueName, gen, messageID) and partitionedMsgDataKey(queueName, partition, gen, messageID). Call-site dispatch on meta.PartitionCount > 1. No variadic, no silent argument loss.

3.D — §3.1 p| discriminator safety (low)
The argument is now explicit: (1) queue names are restricted to [A-Za-z0-9_-] so | can never appear; (2) the generation field is base32-encoded so it uses only uppercase A-Z and digits 2–7, meaning base32 cannot produce |, which prevents any legacy-key prefix scan from matching a new partitioned-key scan prefix; (3) a note that the implementation PR's name validator must continue to reject |. The argument is present and ultimately correct.

3.D — §8.5 upgrade-gate mechanism (low)
Concrete now: /sqs_health advertises capabilities: ["htfifo"]; the catalog's CreateQueue handler polls every registered node from the distribution layer's node registry (same registry as proxyToLeader); fail-closed on timeout; explicit prior art reference to PR #644's AdminForward gate.

3.C — §3.4 Retry-After formula (low)
Formula is ceil((1.0 - currentTokens) / refillRate) with max(1, ...) floor, inline example: refillRate=0.1Retry-After: 10. The "validator keeps refillRate > 0" divide-by-zero note is a good touch.


New findings

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.3 — FifoThroughputLimit=perQueue not implemented in partitionFor

This was flagged as Codex P2 in the second-pass review but was not in the trigger's scope (trigger addressed Claude's five findings only). It remains open in the doc.

§3.2 documents FifoThroughputLimit:

"the alternative 'perQueue' reduces the partition assignment to a single partition (effectively N=1)"

But §3.3's partitionFor pseudocode only gates on meta.PartitionCount <= 1:

if meta.PartitionCount <= 1 {
    return 0
}
// falls through to hash for any PartitionCount > 1

A queue created with PartitionCount=8 + FifoThroughputLimit=perQueue would hash across all 8 partitions, directly contradicting §3.2's documented semantics. The fix is one line: Fix this →

// Fixed guard:
if meta.PartitionCount <= 1 || meta.FifoThroughputLimit == "perQueue" {
    return 0
}

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §4.1 — pseudocode still uses the old sqsMsgDataKey name

§3.1 renamed the constructor, but §4.1 step 5 still references the old form:

5. Build the OCC OperationGroup with partition-aware keys:
     dataKey  = sqsMsgDataKey(queue, partitionIndex, gen, msgID)

An implementor reading §4.1 in isolation gets conflicting guidance vs. §3.1. The pseudocode should dispatch via the named constructors introduced in §3.1: Fix this →

[Low] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.4 — Retry-After formula hardcodes 1.0 regardless of batch size

The fix correctly computes Retry-After from refillRate. But the formula numerator is always 1.0:

secondsToNextToken := math.Ceil((1.0 - currentTokens) / refillRate)

For a SendMessageBatch of 10 entries (charge = 10) against a bucket with 0 tokens and refillRate = 1.0, this gives Retry-After: 1 — but 10 seconds are needed for the batch to succeed. The client burns 9 retries before accumulating enough tokens. The formula should use the requested count instead of the constant 1.0:

secondsToNextToken := math.Ceil((float64(requestedCount) - currentTokens) / refillRate)

This matters most for large batches with slow-to-medium refill rates; for fast refill or single operations the constant-1 floor masks the gap anyway.


Summary

All five issues from the last review are cleanly fixed. The FifoThroughputLimit=perQueue routing gap (Codex P2 from the second-pass review, not previously in Claude's scope) is the only finding here I'd call blocking before the implementation PRs — it means the routing logic in §3.3 contradicts the semantics documented in §3.2. The other two are low-severity doc cleanups.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f289d6c291

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +109 to +110
2. On miss, build the bucket from queue meta and `LoadOrStore` it (one-shot insert race tolerated — both racers will agree on the same configuration).
3. Acquire the bucket's own `mu`, refill based on elapsed time, take or reject the requested tokens, release `mu`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Invalidate cached buckets on throttle config updates

The design caches each (queue, action) bucket on first use via LoadOrStore, but it never defines a refresh/invalidatation path when SetQueueAttributes changes throttle values. For active queues this means updated limits can stay stale until idle eviction or failover, so operators may keep over-throttling or under-throttling traffic long after they reconfigure the queue.

Useful? React with 👍 / 👎.

Comment on lines +181 to +182
secondsToNextToken := math.Ceil((1.0 - currentTokens) / refillRate)
retryAfter := max(1, int(secondsToNextToken)) // never less than 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Base Retry-After on requested token cost

The proposed Retry-After calculation always targets the next single token, but this proposal charges batch requests by entry count. When a large SendMessageBatch is throttled, clients can be told to retry in 1 second even though many more seconds are needed to refill enough tokens for that same batch, which guarantees repeated throttles and adds avoidable retry pressure.

Useful? React with 👍 / 👎.

Comment on lines +202 to +205
`kv/shard_router.go` today routes by queue name. With partitions, the routing key becomes `(queueName, partitionIndex)`. The existing `--raftSqsMap` flag accepts entries keyed by queue name; the syntax extends to accept a partition suffix:

```
--raftSqsMap "orders.fifo:8=group-7,group-8,group-9,group-10,group-11,group-12,group-13,group-14"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Split partition routing from raftSqsMap endpoint mapping

This section assigns queue/partition placement to --raftSqsMap, but that flag is currently defined as raftAddr=sqsAddr endpoint mapping for leader proxying. Reusing it for partition-to-group routing makes the config surface ambiguous and can break proxy address resolution, especially in the receive fanout flow that depends on correct leader endpoint lookup.

Useful? React with 👍 / 👎.

Three findings (1 medium, 2 low):

3.D split-queue FIFO doc:

1. (Medium) §3.3 partitionFor missed FifoThroughputLimit=perQueue.
   §3.2 documents that perQueue "reduces routing to a single partition
   regardless of PartitionCount" but the §3.3 pseudocode only checked
   PartitionCount <= 1. A queue created with PartitionCount=8 +
   FifoThroughputLimit=perQueue would hash across all 8 partitions,
   directly contradicting the documented semantics. Updated the early-
   return guard to:
       if meta.PartitionCount <= 1 || meta.FifoThroughputLimit == "perQueue" {
           return 0
       }
   Codex P2 caught this in the second-pass review but it was outside
   the scope of the previous fix iteration.

2. (Low) §4.1 step 5 still referenced the old variadic
   sqsMsgDataKey(queue, partitionIndex, gen, msgID). §3.1 had already
   moved to two named constructors (legacyMsgDataKey /
   partitionedMsgDataKey) but §4.1's pseudocode still used the
   variadic form, contradicting §3.1. Replaced with explicit if/else
   dispatch through the named constructors so an implementor reading
   §4.1 in isolation gets the same naming as §3.1.

3.C per-queue throttling doc:

3. (Low) §3.4 Retry-After formula's numerator was hardcoded to 1.0
   regardless of charge size. SendMessageBatch of 10 entries against
   a 0-token bucket with refillRate = 1.0 would have returned
   Retry-After: 1 (insufficient) instead of 10. Updated formula to:
       needed := float64(requestedCount) - currentTokens
       secondsToNextRefill := math.Ceil(needed / refillRate)
       retryAfter := max(1, int(secondsToNextRefill))
   requestedCount is the same value §3.3's charge step uses
   (1 for single, len(Entries) for batch).

Docs-only PR; no code changes, no implementation gate.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all three findings from your third review (1 medium, 2 low):

3.D split-queue FIFO doc:

  • (medium) §3.3 partitionFor now also short-circuits when FifoThroughputLimit == "perQueue" — fixes the contradiction with §3.2's documented semantics. Codex P2 from the second-pass review, now resolved.
  • (low) §4.1 step 5 pseudocode now dispatches through the §3.1 named constructors (legacyMsgDataKey / partitionedMsgDataKey) instead of the old variadic form. No more conflicting guidance between §3.1 and §4.1.

3.C per-queue throttling doc:

  • (low) §3.4 Retry-After formula now uses requestedCount instead of the hardcoded 1.0 numerator. SendMessageBatch of 10 against refillRate=1.0 correctly returns Retry-After: 10 instead of 1.

Docs-only changes; +18/-7 lines net.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 2m 9s —— View job


Review

  • Read updated design documents
  • Verify fix: §3.3 partitionFor FifoThroughputLimit check (medium)
  • Verify fix: §4.1 step 5 named constructor dispatch (low)
  • Verify fix: §3.4 Retry-After uses requestedCount (low)
  • Check for new issues (including open Codex P1 findings)
  • Overall assessment

All three issues from the trigger are correctly resolved. Two Codex P1 findings from the previous review remain open and are surfaced below (they were not in scope of this push). Two new low-severity issues found.


All three previous issues: resolved ✓

3.D — §3.3 FifoThroughputLimit=perQueue routing (medium)
The guard is now:

if meta.PartitionCount <= 1 || meta.FifoThroughputLimit == "perQueue" {
    return 0
}

The comment explicitly documents that perQueue collapses all groups to partition 0 regardless of PartitionCount, and calls out the Codex P2 catch. The contradiction with §3.2 is resolved.

3.D — §4.1 step 5 named constructor dispatch (low)
Step 5 now uses the explicit if meta.PartitionCount > 1 dispatch branching to partitionedMsgDataKey / legacyMsgDataKey (and analogues for visKey / groupKey). No more reference to the old variadic sqsMsgDataKey. §3.1 and §4.1 are now consistent.

3.C — §3.4 Retry-After uses requestedCount (low)
Formula is now:

needed              := float64(requestedCount) - currentTokens
secondsToNextRefill := math.Ceil(needed / refillRate)
retryAfter          := max(1, int(secondsToNextRefill))

Both cases are correctly handled: a SendMessageBatch of 10 against refillRate=1.0 gives Retry-After: 10; a single SendMessage against refillRate=0.1 also gives Retry-After: 10. The floor-1 and divide-by-zero notes are preserved.


Open Codex P1 findings from the previous round (not addressed in this push)

These were raised against f289d6c291 and are still open in the current doc.

[Medium] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.1 — bucket cache not invalidated on SetQueueAttributes

§3.1 describes LoadOrStore on first use: "build the bucket from queue meta and LoadOrStore it." Once a bucket is stored, the doc describes no path for invalidating or refreshing it when SetQueueAttributes changes the throttle config. A queue created with SendRefillPerSecond = 10, then lowered to 1 via SetQueueAttributes, continues to allow 10 RPS until the idle-eviction sweep removes the old bucket entry (default 1h window). An operator who lowers limits to stop a noisy tenant would see the effect delayed by up to an hour.

The fix is a sentence in §3.1: SetQueueAttributes must call buckets.Delete(key) for all action-keys of the updated queue after committing the meta change to Raft, so the next request re-initialises the bucket from the new config. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §5 — --raftSqsMap reuse conflicts with endpoint-address semantics

§5 proposes extending --raftSqsMap with a new queueName:N=group1,...,groupN syntax. The existing flag maps raftAddr=sqsAddr (endpoint routing for proxyToLeader). Reusing the same flag for a completely different concern — partition-to-Raft-group assignment — makes the config surface ambiguous: a single flag entry must now be parsed as either an address mapping or a partition assignment, and the proxyToLeader fanout in §4.2 depends on correctly resolving the per-partition leader's SQS endpoint from this same flag. Mixing the two intents in one parsing path risks a subtle bug where a partitioned-queue config entry is accidentally interpreted as a raftAddr=sqsAddr pair (or vice versa), silently producing the wrong proxy target.

A new flag (e.g. --raftSqsFifoMap) or a clearly separated config section avoids the parsing collision and lets the two concerns evolve independently. Fix this →


New findings

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §9 — no test case for FifoThroughputLimit=perQueue

§9's unit test list covers PartitionCount <= 1 and empty MessageGroupId fast-paths but does not include a test that verifies FifoThroughputLimit=perQueue routes to partition 0 even when PartitionCount = 8. The perQueue short-circuit is the fix this push added to §3.3 — without an explicit test the correctness is trivially regressionable. One test case: partitionFor({PartitionCount: 8, FifoThroughputLimit: "perQueue"}, "any-group-id") == 0. Fix this →

[Low] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §4.3 — "sharded by MessageGroupId" is the wrong key

§4.3 describes the forward-compatibility path for Phase 3.D:

"each partition gets its own bucket, sharded by MessageGroupId"

The bucket key should be (queueName, partitionID), not (queueName, MessageGroupId). MessageGroupId is the input to partitionFor — the partition index it maps to is what identifies which leader owns the bucket. Using MessageGroupId as the key would create one bucket per group value (unbounded, attacker-amplifiable map size); using partitionID creates exactly PartitionCount buckets per queue (bounded by design). The §4.3 sentence should read "each partition gets its own bucket, keyed by (queueName, partitionID)." Fix this →


Summary

All three fixes from this push are cleanly applied. The two Codex P1 findings from the previous review — bucket cache invalidation on SetQueueAttributes (3.C) and --raftSqsMap parsing ambiguity (3.D) — remain the only items I'd consider blocking before implementation PRs. Both are design-level gaps that implementation code will have to resolve anyway; better to get the doc right first. The two new findings are low-severity doc corrections.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d424d18d7c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +93 to +94
queue string
action string // "Send" | "Receive" | "*"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Key throttle buckets by queue generation

The bucket key is defined as only (queue, action), so a deleted/recreated queue with the same name will reuse the old in-memory bucket state until idle eviction, inheriting stale token balance and potentially stale limits. In practice this can throttle a brand-new queue immediately (or let it burst unexpectedly) after DeleteQueue/recreate, which breaks queue incarnation isolation; include generation (or another meta version) in the key or define explicit invalidation on queue lifecycle changes.

Useful? React with 👍 / 👎.

Comment on lines +185 to +189
3. For each partitionIndex in partitionOrder, until MaxNumberOfMessages
are collected or every partition has been tried:
a. Resolve the leader for (queue, partitionIndex).
b. If this node is the leader: scan locally, deliver candidates.
c. Otherwise: forward the request to the leader-of-partition via
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound per-partition proxy calls by shared wait budget

This flow fans out ReceiveMessage across partitions by forwarding requests partition-by-partition, but it does not require reducing WaitTimeSeconds across hops. If each forwarded call long-polls independently, an empty queue can take roughly PartitionCount × WaitTimeSeconds, which can exceed SQS-style wait expectations and cause client/LB timeouts; the design should require a single request-level deadline and pass only remaining budget to each partition probe.

Useful? React with 👍 / 👎.

Four findings (2 medium, 2 low):

3.C per-queue throttling doc:

1. (Medium) §3.1 missed bucket cache invalidation on SetQueueAttributes.
   The doc described LoadOrStore on first use but no path for refreshing
   the in-memory bucket when SetQueueAttributes changes the throttle
   config. An operator lowering a limit to stop a noisy tenant would
   wait up to the idle-eviction window (1h default) before the change
   took effect. Added a paragraph: SetQueueAttributes calls
   buckets.Delete(key) for every action key after the Raft commit so
   the next request rebuilds from the new meta. The race with
   LoadOrStore is benign (the rebuilt bucket starts at full capacity,
   same semantics as the failover case).

2. (Low) §4.3 forward-compat sentence used the wrong key for the
   per-partition bucket. "sharded by MessageGroupId" was wrong —
   MessageGroupId is the input to partitionFor, not the bucket key.
   Using it directly would create one bucket per unique group value
   (unbounded, attacker-amplifiable). The bucket key is
   (queueName, partitionID) which is bounded by PartitionCount.
   Replaced the sentence with the correct keying and added the
   bucketKey-grows-a-partition-field note.

3.D split-queue FIFO doc:

3. (Medium) §5 reused --raftSqsMap for partition-to-Raft-group
   assignment. That flag already maps raftAddr=sqsAddr for
   proxyToLeader's endpoint resolution; overloading the same parser
   with partition assignments creates a parsing ambiguity that could
   silently produce the wrong proxy target in §4.2's fanout. Replaced
   with a new dedicated --sqsFifoPartitionMap flag. The existing
   --raftSqsMap is unchanged. Updated §11's rollout table accordingly.

4. (Low) §9 unit-test list was missing a case for
   FifoThroughputLimit=perQueue routing. The §3.3 short-circuit
   guard (added in the previous fix iteration) would silently
   regress without a test. Added: "FifoThroughputLimit=perQueue with
   PartitionCount=8 routes every group ID to partition 0."

Docs-only PR; no code changes, no implementation gate.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all four findings from your fourth review (2 medium, 2 low):

3.C per-queue throttling doc:

  • (medium) §3.1 now requires SetQueueAttributes to call buckets.Delete(key) after the Raft commit so a config change takes effect on the next request, not after the 1h idle-eviction window. Race with LoadOrStore is benign (rebuilt bucket starts at full capacity, same as failover).
  • (low) §4.3 forward-compat fixed: bucket key for split-FIFO is (queueName, partitionID), not MessageGroupId — the latter would create unbounded buckets per group value.

3.D split-queue FIFO doc:

  • (medium) §5 now uses a new dedicated --sqsFifoPartitionMap flag for partition-to-Raft-group assignment instead of overloading --raftSqsMap (which keeps doing endpoint mapping for proxyToLeader). Eliminates the parsing ambiguity. §11 rollout table updated.
  • (low) §9 unit test list now includes FifoThroughputLimit=perQueue with PartitionCount=8 routes every group to partition 0 to lock down the §3.3 short-circuit guard against regression.

Docs-only changes; +9/-6 lines net.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 26s —— View job


Review

  • Read updated design documents
  • Verify fix: §3.1 SetQueueAttributes bucket invalidation (3.C) — medium
  • Verify fix: §4.3 bucket key (queueName, partitionID) vs MessageGroupId (3.C) — low
  • Verify fix: §5 --sqsFifoPartitionMap dedicated flag (3.D) — medium
  • Verify fix: §9 FifoThroughputLimit=perQueue test case (3.D) — low
  • Check for remaining open Codex P1 items from prior rounds
  • Check for new issues
  • Overall assessment

All four issues from the trigger are correctly resolved. Two Codex P1 findings first surfaced in my fourth-round review — and noted there as "not in scope of this push" — remain open in the current doc. One new low-severity finding.


All four trigger fixes: resolved ✓

3.C — §3.1 SetQueueAttributes bucket cache invalidation (medium)
The paragraph is explicit and accurate: after the Raft commit, the handler calls buckets.Delete(key) for all three action-keys (Send, Recv, Default). The race with LoadOrStore is correctly noted as benign (fresh bucket starts at full capacity, same as failover). The 1h idle-eviction context makes the urgency of the invalidation clear.

3.C — §4.3 bucket key (queueName, partitionID) (low)
The corrected key is explicit, the MessageGroupId-as-key footgun is called out with the reason (unbounded attacker-amplifiable map), and the bucketKey struct is noted to gain a partition uint32 field when Phase 3.D lands. Consistent with §3.1's struct definition.

3.D — §5 --sqsFifoPartitionMap dedicated flag (medium)
The new flag is introduced with clear syntax (orders.fifo:8=group-7,...group-14), the separation from --raftSqsMap's endpoint-mapping role is stated explicitly and the reason (parsing ambiguity → wrong proxy target) is documented. The rollout table (§11, PR 4) also references the new flag by name.

3.D — §9 FifoThroughputLimit=perQueue test case (low)
The bullet is present, includes the concrete call form (FifoThroughputLimit = "perQueue" with PartitionCount = 8 → partition 0 for every group ID), and the regression rationale is spelled out ("the perQueue branch is a one-line guard that could easily be dropped during a refactor"). Good.


Still-open Codex P1 findings from d424d18d7c (surfaced in my last review, not addressed in this push)

These were called out as "not in scope of this push" in my fourth-round review summary but were not deferred to a follow-up issue — they are design-level gaps the implementation PRs will have to resolve.

[Medium] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.1 — DeleteQueue does not invalidate the bucket cache

§3.1 now correctly invalidates buckets on SetQueueAttributes, but says nothing about DeleteQueue. A queue deleted and recreated with the same name inherits the old in-memory bucket's token balance and — if the new queue was created without throttle config — the old bucket's capacity/refillRate as well, until the idle-eviction sweep (1h default). An operator who uses DeleteQueue+CreateQueue to reset a noisy queue's state gets a surprise: the throttle limits from the deleted queue follow the new one for up to an hour.

The fix is one sentence: "DeleteQueue calls buckets.Delete for all action-keys of the queue (same as SetQueueAttributes), so a recreated same-name queue starts with a clean bucket." Alternatively, include generation in bucketKey so old and new incarnations never share a bucket entry. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §4.2 — WaitTimeSeconds not bounded across the partition fanout

The fanout iterates partitions sequentially until MaxNumberOfMessages are collected. If each proxied sub-call independently long-polls for WaitTimeSeconds, an empty partitioned queue with WaitTimeSeconds = 20 and PartitionCount = 8 holds the connection for up to 160 s before returning empty — well past any reasonable client or load-balancer timeout. The doc notes "FIFO consumers are expected to use long-poll (WaitTimeSeconds)" as a mitigant for the N-round-trip cost, which inadvertently encourages the very configuration that creates the problem.

§4.2 should specify how the wait budget is managed: either (a) a single top-level deadline is threaded through as remaining_wait = WaitTimeSeconds - elapsed_so_far and passed to each sub-call, so the total wait is bounded by WaitTimeSeconds; or (b) partitions are probed in parallel and the first non-empty result short-circuits. Option (a) matches how AWS's internal partition fanout works; option (b) is simpler to implement but changes the per-partition polling semantics. Either choice should be explicit. Fix this →


New finding

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.4 — reference to --raftSqsMap for partition routing contradicts §5

§3.4 states: "The existing kv/shard_router.go plus the multi-shard configuration flags (--raftSqsMap) already support this — each (queueName, partition) pair becomes its own routing key."

--raftSqsMap maps raftAddr=sqsAddr (endpoint pairs for proxyToLeader); it does not map (queueName, partition) → Raft group. §5 is explicit that partition-to-Raft-group assignment requires the new --sqsFifoPartitionMap flag and that --raftSqsMap is unchanged by this design. An implementor reading §3.4 would conclude partition routing is free with the existing flag, reach §5, and be confused about what needs to be added. The §3.4 sentence should say that while the router infrastructure (kv/shard_router.go) supports multi-group routing today, the partition-to-group assignment configuration requires the new --sqsFifoPartitionMap flag described in §5. Fix this →


Summary

All four trigger fixes are cleanly applied. The two Codex P1 items that have been open since d424d18d7cDeleteQueue bucket lifecycle and WaitTimeSeconds budget management across the fanout — are the only findings I'd call blocking before implementation PRs start. Both are design decisions the implementation code has to embody either way; resolving them in the doc now prevents the PR author from having to reverse a committed approach. The §3.4/§5 inconsistency is low-severity cleanup.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6bd33ab70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +153 to +155
- All four `Send*` / `Recv*` fields must be either both zero (disabled) or both positive.
- Capacity ≥ refill (otherwise the bucket can never burst above the steady state).
- A hard ceiling per queue (e.g. 100,000 RPS) so a typo (`SendCapacity = 1e9`) does not silently mean "no limit at all" but rejects with `InvalidAttributeValue`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require bucket capacity to cover max batch token cost

The validator currently only enforces paired positivity and capacity >= refill, but §3.3 charges batch APIs by len(Entries) (up to 10). With configs like SendCapacity=5, a valid SendMessageBatch of 10 can never succeed because the bucket can never accumulate 10 tokens, so clients get permanent throttling and misleading Retry-After values. Please add validation that per-action capacity is at least the maximum single-request charge (or change the charging rule) to avoid creating unrecoverable throttling states.

Useful? React with 👍 / 👎.

Comment on lines +273 to +275
5. **Mixed-version cluster**: a rolling upgrade where some nodes have HT-FIFO and others don't. The new feature gates on the queue's `PartitionCount > 1` field, which is set at create time; old nodes that try to scan a partitioned queue's keyspace will simply not find anything (the prefix has changed). The catalog rejects `CreateQueue` with `PartitionCount > 1` until every node in the cluster reports the new feature flag.

**The capability advertisement mechanism**: each node's existing `/sqs_health` endpoint (`adapter/sqs.go: serveSQSHealthz`) gains a new field in its JSON body — `capabilities: ["htfifo"]` once this PR's code is in the binary. The catalog's CreateQueue handler reads the live node set from the distribution layer's node registry (the same registry used by `proxyToLeader` to locate leaders), polls `/sqs_health` on each, and gates `PartitionCount > 1` on every node reporting the `htfifo` capability. Nodes that don't respond within a short timeout are treated as not-yet-upgraded — a deliberate fail-closed default so a network blip does not let a partitioned queue land in a partially-upgraded cluster. This mirrors the §3.3.2 admin-forwarding upgrade gate from the admin dashboard design (PR #644), which uses the same "all-nodes-must-report" pattern for `AdminForward`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce HT-FIFO capability beyond queue creation

This guard only blocks CreateQueue when not all nodes advertise htfifo, but the same section notes older nodes will not see partitioned-keyspace data. That leaves a failure mode where a downgraded/replaced old node joins after creation, becomes leader for a partitioned queue, and serves false-empty reads/writes against the wrong prefix. The proposal should require a runtime safeguard (startup or leadership admission) that prevents non-HT-FIFO binaries from serving once any PartitionCount > 1 queue exists.

Useful? React with 👍 / 👎.

…conds, §3.4)

Three findings from the fifth-round Claude review on PR #664:

3.C per-queue throttling — DeleteQueue bucket lifecycle (medium):
§3.1 now requires DeleteQueue to call buckets.Delete(key) for every
bucketKey of the deleted queue (Send, Recv, Default), so a same-name
recreation does not inherit stale token balance/limits for up to 1h.
Alternative (generation in bucketKey) considered and rejected as more
expensive than an explicit Delete.

3.D split-queue FIFO — WaitTimeSeconds budget across fanout (medium):
§4.2 now threads a shared deadline through the partition fanout
(step 3) and passes remainingWait per sub-call (step 4a/c/d). Without
this, an empty queue with WaitTimeSeconds=20 + PartitionCount=8 would
hold the connection for up to 160s. §8.1 also drops the contradictory
"long-poll mitigates the cost" sentence — long-poll was actually the
trigger for the bug.

3.D split-queue FIFO — §3.4 --raftSqsMap reference (low): rephrased
to distinguish the router infrastructure (kv/shard_router.go, which
already supports multi-group routing by key) from the configuration
surface (the new --sqsFifoPartitionMap flag in §5). Eliminates the
contradiction with §5.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all three findings from your fifth-round review (2 medium, 1 low):

3.C per-queue throttling — §3.1 DeleteQueue bucket lifecycle (medium):

  • New paragraph immediately after the SetQueueAttributes invalidation block. DeleteQueue now also calls buckets.Delete(key) for every bucketKey of the deleted queue (Send, Recv, Default), so a DeleteQueue+CreateQueue with the same name no longer inherits the previous incarnation's token balance/limits for up to 1h. Documented the generation-in-bucketKey alternative and the reason for picking the explicit Delete (cheaper, mirrors the SetQueueAttributes pattern).

3.D split-queue FIFO — §4.2 WaitTimeSeconds budget across fanout (medium):

  • Pseudocode reworked: step 3 sets deadline := start + WaitTimeSeconds (cap 20s); step 4a/4c/4d compute remainingWait := max(0, deadline - now()) and pass it as the per-sub-call long-poll budget. Total wall-clock wait is now bounded by the original WaitTimeSeconds regardless of PartitionCount. Without this, an empty queue with WaitTimeSeconds=20 + PartitionCount=8 would hold the connection for up to 160s.
  • Added a "Why the shared deadline matters" paragraph documenting the bug class and the rejected alternative (parallel probe with shared cancellation — rejected because it erases the partitionOrder rotation).
  • §8.1 dropped the contradictory "long-poll mitigates the RTT cost" sentence; long-poll was actually the trigger for the bug.

3.D split-queue FIFO — §3.4 --raftSqsMap reference (low):

  • Rephrased to distinguish the router infrastructure (kv/shard_router.go, which already supports multi-group routing by key) from the configuration surface (--sqsFifoPartitionMap, the new flag from §5). Calls out explicitly that conflating the two would parse partition assignments as endpoint pairs and route to the wrong leader. Aligns with §5.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 20s —— View job


Review

  • Read updated design documents
  • Verify fix: §3.1 DeleteQueue bucket invalidation (3.C) — medium
  • Verify fix: §4.2 WaitTimeSeconds shared deadline across fanout (3.D) — medium
  • Verify fix: §3.4 --raftSqsMap reference cleanup (3.D) — low
  • Check open Codex P1 items from last round
  • Check for new issues
  • Overall assessment

All three fixes from the trigger are correctly applied. Two Codex P1 items raised in the sixth-round review remain open and are the only findings I would consider blocking before implementation PRs. One new low-severity finding.


All three trigger fixes: resolved ✓

3.C — §3.1 DeleteQueue bucket lifecycle (medium)
The paragraph is explicit and accurate: after the Raft commit that purges the queue meta, buckets.Delete(key) is called for all three action-keys. The race with LoadOrStore is correctly noted as benign. The alternative (generation-in-bucketKey) is documented and the reason for rejecting it is clear. The 1 h idle-eviction context makes the urgency obvious.

3.D — §4.2 WaitTimeSeconds shared deadline (medium)
Step 3 sets deadline := start + WaitTimeSeconds (capped at 20s); step 4a computes remainingWait := max(0, deadline - now()) and passes it to each sub-call. The "Why the shared deadline matters" paragraph correctly explains the 160 s pathological case that the fix eliminates, and the rejected parallel-probe alternative is documented with the reason (erases partitionOrder rotation semantics). §8.1 no longer contains the contradictory long-poll mitigation sentence.

3.D — §3.4 --raftSqsMap reference cleanup (low)
The distinction is now explicit: kv/shard_router.go already supports multi-group routing (no changes needed there), but the configuration surface for partition-to-group assignment is --sqsFifoPartitionMap (§5), not --raftSqsMap. The consequence of conflating the two ("parse partition assignments as endpoint pairs and route to the wrong leader") is called out directly.


Still-open Codex P1 findings from f6bd33ab70 (sixth-round Codex review)

These were raised in the commit-by-commit Codex review on PR #664 and have not yet been addressed.

[Medium] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.2 — validator does not bound capacity to max single-request charge

The validator enforces capacity >= refill but not capacity >= max_single_request_charge. The maximum single-request charge from the Send bucket is 10 (a full SendMessageBatch). A queue configured with SendCapacity = 5 creates a state where every SendMessageBatch of > 5 entries is permanently unserviceable — the bucket can only hold 5 tokens and can never accumulate 10. The client receives Throttling forever, Retry-After grows proportionally, and there is no way to recover without reconfiguring the queue. The Retry-After formula (§3.4) correctly computes ceil((10 - 5) / refillRate) but that only tells the client "wait N seconds" before they'll be throttled again.

The fix is one validation rule: each capacity that covers a batch verb must be at least the maximum charge of any single request through that bucket. For Send: SendCapacity >= 10. For Recv: RecvCapacity >= 10 (DeleteMessageBatch is also up to 10). Document the constraint in §3.2 and reject with InvalidAttributeValue at SetQueueAttributes. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §8 item 5 — no runtime safeguard against a downgraded node becoming leader after HT-FIFO queue creation

§8 item 5 says "the catalog rejects CreateQueue with PartitionCount > 1 until every node reports htfifo" — a create-time gate. But the same item acknowledges "older nodes that try to scan a partitioned queue's keyspace will simply not find anything." There is no runtime protection against the following sequence:

  1. All nodes have htfifo; a partitioned queue orders.fifo is created.
  2. Node A is upgraded back (or replaced by a rollback image) and rejoins without htfifo.
  3. Node A is elected leader for the partition-3 Raft group.
  4. Node A's ReceiveMessage scans the old prefix (!sqs|msg|data|orders.fifo|...) and finds nothing — false-empty reads.
  5. Node A's SendMessage writes the old key format — data on partition 3 is written to the wrong prefix and the reaper won't find it under the new prefix.

The create-time gate does not cover this; it only prevents new partitioned queues from being created in a mixed-version cluster. Once a partitioned queue exists, any node that serves it must also understand the new prefix.

The proposal needs a runtime safeguard: a node that does not have htfifo should refuse to become leader (or refuse to serve the SQS handler) for a Raft group that contains any queue with PartitionCount > 1. A practical implementation: on startup (or on leadership acquisition), query the catalog for partitioned queues in the local shard; if any exist and the binary lacks htfifo, refuse leadership with an explicit log message. This is the node-admission variant of the create-time gate. Fix this →


New finding

[Low] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.1 — bucketKey struct definition and DeleteQueue paragraph are inconsistent on whether partitionID is in the current key

The struct definition in §3.1 is:

type bucketKey struct {
    queue  string
    action string  // "Send" | "Receive" | "*"
}

But the DeleteQueue paragraph (added in this latest push) says:

"so we keep bucketKey = (queueName, partitionID, action) for now"

partitionID is not in the current struct. §4.3 explicitly says the struct "grows a partition uint32 field" when Phase 3.D lands. The DeleteQueue paragraph should say (queueName, action)partitionID is a future addition. An implementor reading §3.1 end-to-end will see the struct definition conflict with the inline text, and if they include partitionID in Phase 3.C based on the paragraph, the §4.3 forward-compat note will be confusing. Fix this →


Summary

All three trigger fixes are cleanly applied. The two Codex P1 items from the sixth-round review — bucket capacity bound on max batch charge (3.C §3.2) and runtime safeguard for downgraded nodes post-HT-FIFO creation (3.D §8) — are the only findings I'd call blocking before the implementation PRs. Both represent design gaps that will have to be resolved in code regardless; better to settle the spec now. The bucketKey inconsistency is a low-severity cleanup.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 624d411a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


No global lock is held during step 3; concurrent traffic on different queues runs in parallel.

**Cache invalidation on `SetQueueAttributes`**: when an operator updates the throttle config via `SetQueueAttributes`, the handler — *after* the Raft commit that persists the new `sqsQueueThrottle` — calls `buckets.Delete(key)` for every `bucketKey` belonging to the updated queue (`Send`, `Recv`, `Default`). Without this step the in-memory bucket would keep enforcing the old limits until the idle-eviction sweep removes the stale entry (default 1 h window), defeating the operator's intent to throttle a noisy tenant in real time. The `LoadOrStore` race with the `Delete` call is benign: the next request rebuilds from the freshly-committed meta, and the rebuilt bucket starts at full capacity (same semantics as the failover case documented below). Claude P1 on PR #664 caught the gap.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use canonical action keys when deleting throttle buckets

The design defines bucketKey.action as "Send" | "Receive" | "*", but this invalidation step deletes ("Send", "Recv", "Default"). If implemented as written, Receive and default buckets will not be removed on SetQueueAttributes, so updated limits can remain stale until idle eviction (and the same mismatch appears in the DeleteQueue path), which directly defeats the real-time reconfiguration behavior this section is trying to guarantee.

Useful? React with 👍 / 👎.

}
```

`PartitionCount` is **immutable after first SendMessage**. The validator on `SetQueueAttributes` rejects any change; operators who want a different partition count create a new queue. Why immutable: changing it would require re-hashing every existing message into a new partition, which (a) breaks ordering for in-flight messages of every group whose hash bucket changed, and (b) is a multi-second / multi-minute operation that cannot be expressed as one OCC transaction.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep FIFO routing attributes immutable after first send

This section only freezes PartitionCount, but routing also depends on FifoThroughputLimit (perQueue forces partition 0 in partitionFor). Allowing SetQueueAttributes to flip throughput mode after messages exist can move the same MessageGroupId to a different partition mid-stream, splitting one group across partitions and violating FIFO within-group ordering; the proposal should lock routing-affecting attributes after first send (or require queue recreation).

Useful? React with 👍 / 👎.

…rade, key)

Three findings from the sixth-round review on PR #664:

3.C per-queue throttling §3.2 — capacity bound on max batch charge
(Codex P1, medium): SetQueueAttributes validator now requires capacity
>= max single-request charge. Send/Recv buckets must be >= 10 when
non-zero (full SendMessageBatch / DeleteMessageBatch is 10 entries).
Without this rule, SendCapacity=5 makes every full batch permanently
unserviceable — the bucket can never accumulate enough tokens.

3.D split-queue FIFO §8 item 5 — runtime safeguard for downgraded
leaders (Codex P1, medium): create-time gate only blocks new
partitioned queues. Once one exists, a downgraded node lacking
htfifo capability could be elected leader and produce false-empty
reads + key-prefix corruption. Adds a node-admission rule:
on startup + leadership acquisition, enumerate local-shard queues
with PartitionCount > 1; if any and the binary lacks htfifo,
refuse leadership and step down via TransferLeadership.

3.C per-queue throttling §3.1 — DeleteQueue paragraph bucketKey
text (Claude low): "(queueName, partitionID, action)" contradicted
the struct definition above it (no partitionID until Phase 3.D §4.3).
Corrected to "(queueName, action) for Phase 3.C".
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all three findings from your sixth-round review (2 medium Codex P1 + 1 low Claude):

3.C per-queue throttling — §3.2 capacity ≥ max batch charge (Codex P1, medium):

  • New fourth bullet on the SetQueueAttributes validator: when SendCapacity > 0 it must also be ≥ 10, and likewise RecvCapacity > 0 → ≥ 10. Justified inline against the §3.3 charging table (SendMessageBatch / DeleteMessageBatch charge up to 10 each). Documented the failure mode the rule prevents (permanently unserviceable batches with no recovery short of re-running SetQueueAttributes) and the rejection envelope (InvalidAttributeValue with the per-bucket minimum named so the operator sees the cause). Default* is exempt because no batch verb is in scope of the catch-all bucket today; explicit forward note for when one moves there.

3.D split-queue FIFO — §8 item 5 runtime safeguard for downgraded leaders (Codex P1, medium):

  • New paragraph after the capability advertisement mechanism. Describes the failure sequence the create-time gate misses (downgrade after a partitioned queue exists → false-empty reads + key-prefix corruption from the rolled-back leader). Specifies the safeguard: on startup and on leadership acquisition for an SQS Raft group, enumerate local-shard queues with PartitionCount > 1; if any exist and the binary lacks htfifo, refuse leadership with an explicit log line and TransferLeadership away (group becomes leaderless if no peer is willing — desired fail-closed behaviour). Hooks into the existing leadership-acquisition path in kv/lease_state.go so steady-state cost is zero. Explicitly required before the §11 PR 4 rollout step that marks the binary htfifo-eligible.

3.C per-queue throttling — §3.1 bucketKey text/struct consistency (Claude low):

  • The DeleteQueue paragraph said "we keep bucketKey = (queueName, partitionID, action) for now" but the struct definition above has only queue + action (no partitionID until Phase 3.D per §4.3). Corrected to "for Phase 3.C we keep bucketKey = (queueName, action) per the struct definition above — the partition field is added later by Phase 3.D, see §4.3".

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 10s —— View job


Review

  • Read updated design documents
  • Verify fix: §3.2 capacity ≥ max batch charge (3.C) — Codex P1 medium
  • Verify fix: §8 item 5 runtime safeguard for downgraded leaders (3.D) — Codex P1 medium
  • Verify fix: §3.1 bucketKey text/struct consistency (3.C) — Claude low
  • Check for new issues (including open Codex P1 from last round)
  • Overall assessment

All three fixes from this push are correctly applied. Two Codex P1 findings from the previous round's Codex review (on commit 624d411a93) were not in scope of this push and remain open. One new low-severity finding.


All three trigger fixes: resolved ✓

3.C — §3.2 capacity ≥ max batch charge (Codex P1, medium)
The fourth validation bullet is precise: SendCapacity > 0 → SendCapacity ≥ 10, RecvCapacity > 0 → RecvCapacity ≥ 10, justified by the §3.3 charging table. The failure mode (permanent unserviceability for full batches) is documented inline, the rejection envelope (InvalidAttributeValue) is named, and the Default* exemption is correctly scoped with an explicit forward note. The Codex P1 attribution is clear. Clean.

3.D — §8 item 5 runtime safeguard for downgraded leaders (Codex P1, medium)
The new paragraph accurately describes the failure sequence the create-time gate misses (downgrade after partitioned queue exists → false-empty reads + key-prefix corruption). The safeguard is concrete: catalog enumeration on startup + leadership acquisition, TransferLeadership on capability mismatch, fail-closed if no willing peer. The hook into kv/lease_state.go, the zero steady-state cost claim, and the ordering constraint relative to §11 PR 4 are all present and correct.

3.C — §3.1 bucketKey text/struct consistency (Claude low)
The DeleteQueue paragraph now reads "for Phase 3.C we keep bucketKey = (queueName, action) per the struct definition above — the partition field is added later by Phase 3.D, see §4.3." Consistent with the struct definition on lines 92–95 and the forward-compat note in §4.3. The previous partitionID mention is gone.


Still-open Codex P1 findings from the commit 624d411a93 Codex review

These were new findings added by Codex on the last commit round; they were not in scope of this push.

[Medium] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.1 — action key names in cache invalidation don't match struct definition

The bucketKey.action field is defined at line 94 as "Send" | "Receive" | "*". But both the SetQueueAttributes and DeleteQueue invalidation paragraphs (lines 114 and 116) delete keys for "Send", "Recv", "Default":

calls buckets.Delete(key) for every bucketKey belonging to the updated queue (Send, Recv, Default)

"Recv""Receive" and "Default""*". An implementation that follows the struct definition will never match the keys the invalidation code tries to delete, so Receive and Default buckets will remain stale after a config update — defeating the correctness guarantee the SetQueueAttributes invalidation paragraph is trying to establish. The fix is one-word changes in both paragraphs: use "Receive" and "*" (matching the struct), or update the struct comment to use "Recv" / "Default" and define those as the canonical values. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §3.2 — FifoThroughputLimit and DeduplicationScope are mutable after first send

Line 106 states only PartitionCount is immutable after first SendMessage. But routing in partitionFor also depends on FifoThroughputLimit — the perQueue short-circuit (added in an earlier round) collapses all groups to partition 0 regardless of PartitionCount. If an operator flips FifoThroughputLimit from perMessageGroupId to perQueue after messages exist, every MessageGroupId that was previously distributed across N partitions now routes to partition 0, splitting those groups across multiple partitions and violating within-group FIFO ordering for any in-flight messages. Similarly, DeduplicationScope affects how the dedup key is scoped; changing it live would either duplicate or suppress messages unexpectedly.

The fix is extending the immutability sentence: "PartitionCount, FifoThroughputLimit, and DeduplicationScope are all immutable after first SendMessage" — since all three affect routing or dedup semantics in ways that cannot be changed without a full drain. Fix this →


New finding

[Low] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §6 — integration test example uses an attribute value that §3.2 now rejects

§6 item 2 reads:

"Configure a queue with SendCapacity=5 SendRefillPerSecond=1. Send 5 messages back-to-back → all 200."

But §3.2's new fourth validation bullet requires SendCapacity ≥ 10 when non-zero. A SetQueueAttributes call with SendCapacity=5 would now return InvalidAttributeValue before the test even creates its first message — meaning the test as written would fail at setup, not at message 6. The example should use SendCapacity=10 (or higher) so the test is consistent with the validator it is supposed to be testing against. Fix this →


Summary

All three trigger fixes are cleanly applied. The two Codex P1 findings from the previous round's review — action key naming mismatch in cache invalidation (3.C §3.1) and FifoThroughputLimit / DeduplicationScope mutability (3.D §3.2) — are the only items I'd consider blocking before implementation PRs. Both are design-level decisions that implementation code would have to make regardless; getting the doc right now prevents the PRs from encoding the wrong choice. The integration test inconsistency is low-severity cleanup.
| Branch

…immutability, test capacity)

Three findings from the seventh-round review on PR #664:

3.C per-queue throttling §3.1 -- action key naming consistency
(Codex P1, medium): bucketKey.action canonical values are
"Send"|"Receive"|"*" but cache invalidation paragraphs were saying
"Send","Recv","Default" (the JSON config field-name prefixes).
An implementation following the struct would never match the
wrong key names, leaving Receive and "*" buckets stale after
config changes. Changed both invalidation paragraphs to use the
struct vocabulary, and added a "Config-field -> bucket-action
mapping" paragraph after the sqsQueueThrottle struct so the
short-form / canonical-form distinction is explicit.

3.D split-queue FIFO §3.2 -- FifoThroughputLimit and
DeduplicationScope must also be immutable (Codex P1, medium):
prior text only marked PartitionCount as immutable. But changing
FifoThroughputLimit (perMessageGroupId<->perQueue) flips the §3.3
short-circuit on partition routing -- in-flight messages stay where
they are, new sends route differently, within-group FIFO ordering
silently breaks. Changing DeduplicationScope reshapes the dedup
key so the prior scope's de-duped messages can resurrect or new
sends can be incorrectly suppressed. Extended the immutability
sentence to cover all three with one bullet per attribute
explaining the failure mode.

3.C per-queue throttling §6 item 2 -- test capacity inconsistent
with new validator (Claude low): integration test used
SendCapacity=5 but §3.2's new fourth bullet requires
SendCapacity >= 10. Test would have failed at setup, not on the
6th send. Bumped to SendCapacity=10 / 10 messages back-to-back /
11th throttled, with an inline note explaining why the value
changed.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

Pushed addressing all three findings from your seventh-round review (2 medium Codex P1 + 1 low Claude):

3.C per-queue throttling — §3.1 action key naming consistency (Codex P1, medium):

  • Both invalidation paragraphs (SetQueueAttributes, DeleteQueue) now read (Send, Receive, *— the canonical action values from thebucketKey struct above). The prior (Send, Recv, Default) was the JSON config field-name vocabulary, not the in-memory map key — implementations following the struct would have left Receive and * buckets stale.
  • Added a new "Config-field → bucket-action mapping" paragraph immediately after the sqsQueueThrottle struct definition. Explicitly maps Send*bucketKey{action:"Send"}, Recv*bucketKey{action:"Receive"}, Default*bucketKey{action:"*"}, and pins which vocabulary to use where (config-field for the JSON contract / GetQueueAttributes response; bucket-action for in-memory bucket-store ops). Implementation must apply the mapping after SetQueueAttributes commit.

3.D split-queue FIFO — §3.2 immutability extended to FifoThroughputLimit + DeduplicationScope (Codex P1, medium):

  • Sentence rewritten: "PartitionCount, FifoThroughputLimit, and DeduplicationScope are all immutable after first SendMessage." Three bullets explain the failure mode for each:
    • PartitionCount — re-hashing every message, breaks in-flight ordering, multi-second op (existing reasoning).
    • FifoThroughputLimit — flipping perMessageGroupId ↔ perQueue activates/deactivates the §3.3 short-circuit; in-flight messages stay where they are, new sends for the same group route to a different partition, within-group FIFO is silently violated.
    • DeduplicationScope — changing scope reshapes the dedup key ((queue, dedupId) vs (queue, partitionId, dedupId) vs (queue, MessageGroupId, dedupId)); narrowing scope can resurrect previously-deduped duplicates, widening can suppress legitimately-new sends that collide with still-cached prior-scope entries.
  • Concluding sentence pins the unifying invariant: each attribute participates in a routing or dedup decision whose correctness depends on every existing message having been written under one consistent value, and live mutation creates incompatible "before"/"after" sets the runtime cannot reconcile without a full drain.

3.C per-queue throttling — §6 item 2 test capacity (Claude low):

  • Bumped from SendCapacity=5 SendRefillPerSecond=1 / 5 messages / 6th throttled to SendCapacity=10 SendRefillPerSecond=1 / 10 messages / 11th throttled. Added a parenthetical inline note explaining why the value changed (the new §3.2 validator would have rejected SendCapacity=5 at setup).

@claude review

@bootjp bootjp requested a review from Copilot April 28, 2026 14:00
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds SQS Phase 3 design proposals and (despite the PR’s “docs-only” framing) lands significant Go implementation + test changes to introduce per-queue throttling and HT-FIFO (split-queue FIFO) schema/validation scaffolding in the SQS adapter.

Changes:

  • Add design proposal docs for per-queue throttling (Phase 3.C) and HT-FIFO partitioning (Phase 3.D).
  • Implement per-queue token-bucket throttling (bucket store, config parsing/validation, request-path charging, cache invalidation) plus integration/unit tests.
  • Introduce HT-FIFO queue meta fields and validation (PartitionCount / FifoThroughputLimit / DeduplicationScope), including a temporary “dormancy gate”, with unit/integration tests.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md Proposal doc for per-queue throttling behavior, config surface, and rollout/testing strategy.
docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md Proposal doc for HT-FIFO partitioning model, routing, keyspaces, rollout plan, and failure modes.
adapter/sqs_throttle.go New throttling implementation (bucket store, charging, invalidation, sweep loop, helpers).
adapter/sqs_throttle_test.go Unit tests for bucket math, eviction/reconciliation races, config validation, and helpers.
adapter/sqs_throttle_integration_test.go End-to-end tests for throttling behavior and SetQueueAttributes invalidation semantics.
adapter/sqs_partitioning.go New HT-FIFO vocabulary, routing primitive (partitionFor), validators, and attribute rendering helpers.
adapter/sqs_partitioning_test.go Unit tests for partition routing, validation, immutability, and idempotency helpers.
adapter/sqs_partitioning_integration_test.go Wire-level integration tests for HT-FIFO validation/immutability and the temporary dormancy gate.
adapter/sqs_messages.go Threads throttling checks into Send/Receive/Delete/ChangeVisibility paths; refactors SendMessage decode/validation.
adapter/sqs_messages_batch.go Adds throttling checks for batch verbs (SendMessageBatch/DeleteMessageBatch/ChangeMessageVisibilityBatch).
adapter/sqs_catalog.go Extends queue meta (Throttle + HT-FIFO fields), parsing/validation, attribute rendering, idempotency checks, and invalidation plumbing.
adapter/sqs_catalog_test.go Extends CreateQueue idempotency tests to include Throttle* differences.
adapter/sqs.go Adds server-owned bucket store lifecycle and JSON throttling error helper.

Comment thread adapter/sqs_throttle.go Outdated
Comment on lines +619 to +625
// queueThrottleConfig loads the Throttle config and Generation off a
// queue's meta record. Generation participates in the bucket key, so
// it must travel with the throttle snapshot to avoid a stale-meta
// read that mints a fresh bucket under the wrong incarnation.
//
// Returns:
// - (cfg, gen, nil) on a successful read of an existing queue.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queueThrottleConfig’s comment refers to “Generation participates in the bucket key” and says it loads Generation, but it returns meta.Incarnation. Please update the comment to match the actual return values and the keying scheme.

Suggested change
// queueThrottleConfig loads the Throttle config and Generation off a
// queue's meta record. Generation participates in the bucket key, so
// it must travel with the throttle snapshot to avoid a stale-meta
// read that mints a fresh bucket under the wrong incarnation.
//
// Returns:
// - (cfg, gen, nil) on a successful read of an existing queue.
// queueThrottleConfig loads the Throttle config and Incarnation off a
// queue's meta record. Incarnation participates in the bucket key, so
// it must travel with the throttle snapshot to avoid a stale-meta
// read that mints a fresh bucket under the wrong incarnation.
//
// Returns:
// - (cfg, incarnation, nil) on a successful read of an existing queue.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# Per-Queue Throttling and Tenant Fairness for the SQS Adapter

**Status:** Proposed
**Author:** bootjp
**Date:** 2026-04-26
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR metadata/description says this is a docs-only PR with CI “irrelevant”, but this diff also adds/changes multiple Go implementation + test files (e.g., new throttling + HT-FIFO schema/validation code). Please update the PR title/description and test plan to reflect the code changes (or split docs vs implementation into separate PRs) so reviewers and release notes aren’t misled.

Copilot uses AI. Check for mistakes.
Comment thread adapter/sqs_catalog.go Outdated
//
// If meta.Throttle is empty (the IsEmpty short-circuit) the function
// also drops the empty struct so a round-trip GetQueueAttributes
// reports the queue as untrothttled rather than zero-valued. Mirrors
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: “untrothttled” should be “unthrottled”.

Suggested change
// reports the queue as untrothttled rather than zero-valued. Mirrors
// reports the queue as unthrottled rather than zero-valued. Mirrors

Copilot uses AI. Check for mistakes.
Comment thread adapter/sqs_throttle.go Outdated
Comment on lines +176 to +177
// incarnation is sqsQueueMeta.Generation so a same-name recreate or
// cross-leader failover never reuses an older incarnation's tokens.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for bucketStore.charge says “incarnation is sqsQueueMeta.Generation”, but the code and the rest of the design use Incarnation (CreateQueue-only) specifically to avoid PurgeQueue resetting buckets. Please update this comment to avoid future callers passing Generation by mistake.

Suggested change
// incarnation is sqsQueueMeta.Generation so a same-name recreate or
// cross-leader failover never reuses an older incarnation's tokens.
// incarnation is the queue's Incarnation value (the CreateQueue-only
// lifecycle identity), not sqsQueueMeta.Generation, so PurgeQueue does
// not reset buckets while same-name recreate or cross-leader failover
// still never reuses an older incarnation's tokens.

Copilot uses AI. Check for mistakes.
Comment thread adapter/sqs_throttle.go Outdated
Comment on lines +377 to +385
// Generation participates in the key (Codex P1 on PR #664): we do
// not know which incarnations have buckets cached, so range the map
// and remove any entry whose queue matches. A SetQueueAttributes
// invalidation on the same incarnation must drop the same-gen
// bucket so the new throttle config takes effect; a DeleteQueue /
// CreateQueue cycle would also drop any pre-existing incarnation's
// bucket here, although those entries also fall out via idle
// eviction since the new incarnation lands under a different key
// and the old key never sees traffic again.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateQueue’s comment says “Generation participates in the key”, but bucketKey is (queue, action, incarnation). Please adjust the comment to refer to Incarnation to match the actual keying behavior.

Copilot uses AI. Check for mistakes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

@claude review

2 similar comments
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

@claude review

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @bootjp's task in 6m 2s —— View job


Review

  • Read current state of implementation files (Rounds 10 + 11 + latest Copilot/CodeRabbit)
  • Pushed fixes: error leaking, stale comments, test header, typo, doc gaps
  • Overall assessment

Pushed 642ec0d addressing all actionable findings from the latest CodeRabbit and Copilot reviews. Below is the full accounting.


Fixes pushed (5 files, 642ec0d)

adapter/sqs_throttle.go — 4 comment/correctness fixes

  1. Error leaking in chargeQueue (CodeRabbit major)
    writeSQSError(w, ..., "throttle config read failed: "+err.Error()) replaced with writeSQSErrorFromErr(w, err). The old form passed wrapped Pebble/Raft error text to clients; writeSQSErrorFromErr sanitises it consistently with the rest of the adapter.

  2. charge docstring: "sqsQueueMeta.Generation" → "sqsQueueMeta.Incarnation" (Copilot)
    The comment at line 176 said "incarnation is sqsQueueMeta.Generation" — the opposite of the truth. Round 10 introduced the Generation/Incarnation split specifically so PurgeQueue does not reset buckets; the comment was not updated.

  3. invalidateQueue comment: "Generation participates" → "Incarnation participates" (Copilot)
    Same stale vocabulary, one block below.

  4. queueThrottleConfig docstring (Copilot)
    "loads the Throttle config and Generation" → "loads the Throttle config and Incarnation"; return-value annotation "cfg, gen, nil" → "cfg, incarnation, nil".

adapter/sqs_throttle_test.go — stale header comment (CodeRabbit minor)
The block comment above TestBucketStore_IncarnationKeyedDoesNotReuseAcrossIncarnations opened with the name of a different function (TestBucketStore_InvalidateQueueDropsAllActions) — a dead reference left from a rename. Replaced with a clean comment for the actual function.

adapter/sqs_catalog.go — typo (Copilot)
"untrothttled""unthrottled" at line 651.

docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §4.3 (CodeRabbit major)
Added an explicit "Budget semantics per partition" paragraph: each partition's bucket gets the full configured SendCapacity (not a divided share); aggregate throughput = N × limit; rationale (cross-shard coordination would cost a Raft round-trip per SendMessage, defeating partitioning); and the escape-hatch design for a future SendCapacityTotal aggregate attribute if needed.

docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §7.2 (CodeRabbit major)
Dropped the "AWS-shape callers can omit PartitionCount and get a sensible default" sentence. PartitionCount is now required when multi-partition FIFO is requested; omission returns InvalidAttributeValue. Rationale: a server-side default makes CreateQueue idempotency depend on hidden deployment state, and PartitionCount is immutable from creation, so a wrong inferred value requires DeleteQueue+CreateQueue to fix.


One remaining design question (not blocking)

CodeRabbit major: fail-open after exhausting the 4-retry loop (sqs_throttle.go:218–221)

// Should not happen — the for-loop drained without ever finding a
// live bucket. Treat as allowed=true (fail-open) so misconfiguration
// of the bucket store cannot produce a hard 429 storm.
return chargeOutcome{allowed: true, bucketPresent: false}

CodeRabbit argues this should fail-closed. The current code's rationale is that 4 retries exhausting without finding a live bucket indicates an extreme pathological state (repeated sweep+eviction of brand-new buckets within microseconds), and returning throttled would storm-throttle all traffic to the queue. The design choice is deliberate and the comment explains it.

My assessment: fail-open is the right default here — the retry loop is a guard against the sync.Map load-then-evict race, not an outer request retry. If 4 passes through the map all find evicted buckets, the bucket store is in a state that will almost certainly resolve on the next request (a new bucket will have been inserted by the time this goroutine returns). Failing closed would turn a transient internal race into a permanent 429 for the client. The comment should stay; no code change is needed.


Summary

All five CodeRabbit/Copilot findings from the round 11/12 reviews are addressed. No open items remain. The docs-plus-implementation PR is ready to merge.
| Branch

…s (PR #664)

- chargeQueue: use writeSQSErrorFromErr instead of leaking err.Error() in body
- bucketStore.charge comment: "sqsQueueMeta.Generation" → "sqsQueueMeta.Incarnation"
- invalidateQueue comment: "Generation participates" → "Incarnation participates"
- queueThrottleConfig comment: "Throttle config and Generation" → "Throttle config and Incarnation"
- sqs_throttle_test.go: remove stale TestBucketStore_InvalidateQueueDropsAllActions
  header merged into the IncarnationKeyed test comment block
- sqs_catalog.go: typo "untrothttled" → "unthrottled"
- docs(3.C §4.3): document per-partition budget multiplication semantics explicitly
  (each partition gets full SendCapacity; aggregate = N × limit; rationale + escape hatch)
- docs(3.D §7.2): require explicit PartitionCount for multi-partition FIFO; drop
  server-side default inference (idempotency hazard per CodeRabbit major)

Co-authored-by: Yoshiaki Ueda (bootjp) <bootjp@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/sqs_catalog.go`:
- Around line 487-494: The PartitionCount attribute parser (the
"PartitionCount": func in the sqs catalog that sets sqsQueueMeta.PartitionCount)
should remove the //nolint:gosec suppression and instead parse with
strconv.ParseUint using a 64-bit size, then explicitly check that the parsed
value <= math.MaxUint32; if out of range return the same newSQSAPIError with
invalid attribute message, otherwise safely cast to uint32 and assign to
m.PartitionCount. Ensure you keep existing ParseUint error handling and return
the same HTTP bad request error for non-numeric input.

In `@docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md`:
- Around line 115-117: The doc currently suggests using a raw
buckets.Delete(key) for throttle invalidation which reintroduces the
orphan-bucket race; change the text to require the same
lock-then-delete-then-flag semantics as bucketStore.invalidateQueue (and
reference bucketKey/incarnation) instead of per-key Delete, i.e., require
implementers to call the invalidateQueue-style path that takes the lock, marks
the bucket invalid, and then removes entries so a pre-existing in-memory bucket
cannot be spent after invalidation; mention the exact symbols
buckets.Delete(key), bucketStore.invalidateQueue, and bucketKey so readers
locate the correct code paths.

In `@docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md`:
- Around line 227-234: The doc must state that X-Elastickv-Receive-Partition is
an internal-only hop hint: ensure the ingress path strips or ignores any
client-supplied X-Elastickv-Receive-Partition header and only the trusted proxy
path (proxyToLeader, extended to accept a partition argument) may set this
header when forwarding ReceiveMessage calls; update the leader-proxy machinery
description to show proxyToLeader adds X-Elastickv-Receive-Partition and sets
WaitTimeSeconds from remainingWait, and clarify that downstream handlers must
treat any externally provided header as untrusted (ignore/reject) so callers
cannot force single-partition reads or observe partial/false-empty
ReceiveMessage results.
- Around line 370-373: Remove the "Default `PartitionCount` for AWS-shape
callers" and the "`PartitionCount` per-shard vs per-queue" open-question
bullets: delete the text referencing a server-side default for PartitionCount
and the suggestion of choosing 8/16/32 so the doc no longer proposes a default
when FifoThroughputLimit = perMessageGroupId; instead leave the explicit
requirement from §7.2 that PartitionCount must be provided and omissions are
rejected, and retain the `--sqsDefaultFifoPartitionCount` flag discussion only
if it is framed as an operator-side convenience (not a server default).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ff5c15d-2d41-428e-819f-15bfcb0d3ac7

📥 Commits

Reviewing files that changed from the base of the PR and between d1ce7c1 and 642ec0d.

📒 Files selected for processing (5)
  • adapter/sqs_catalog.go
  • adapter/sqs_throttle.go
  • adapter/sqs_throttle_test.go
  • docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md
  • docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • adapter/sqs_throttle.go

Comment thread adapter/sqs_catalog.go
Comment thread docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md Outdated
Comment thread docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md
Comment thread docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md Outdated
…tionCount validator (PR #664 round 12 follow-up)

Two findings from the round-11 CodeRabbit review that were not in the user's 642ec0d commit:

(1) CodeRabbit Major (sqs_throttle.go:221) — chargeBucket retry loop exhaustion was fail-open. The 4-iteration retry loop's terminal fallback returned chargeOutcome{allowed: true}, turning the invalidate/reconcile race into a throttle bypass on the exact path meant to enforce limits. Fixed: terminal fallback now returns chargeOutcome{allowed: false, retryAfter: 1s, bucketPresent: false} so the client gets a normal back-off cue.

(2) CodeRabbit Major (split-queue-fifo §7.2) — the docs already require an explicit PartitionCount for HT-FIFO (landed in 642ec0d), but the validator did not enforce that on the wire. Added validatePartitionConfig rule that rejects FifoThroughputLimit=perMessageGroupId on PartitionCount<=1 with InvalidAttributeValue. Refactored validatePartitionConfig into validatePartitionShape + validateStandardQueueRejectsHTFIFO helpers to keep cyclop ≤ 10 once the new branch landed.

New unit test TestValidatePartitionConfig_PerMessageGroupIDRequiresExplicitPartitionCount covers the (PartitionCount=0, PartitionCount=1, PartitionCount=8) × FifoThroughputLimit accept/reject matrix.

Existing integration tests (Immutability / AllOrNothing / GetQueueAttributesRoundTrip) updated to use FifoThroughputLimit=perQueue rather than perMessageGroupId — perMessageGroupId is now reachable only when paired with PartitionCount > 1, which the §11 PR 2 dormancy gate currently rejects on the wire. perQueue is a legitimate single-partition FIFO setting and exercises the same immutability/roundtrip paths.

go test -race ./adapter/... pass; golangci-lint ./... clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

Round 12 follow-up (55297f6a) で round-11 CodeRabbit Major review のうち、642ec0d4 (round 12 main) でカバーされていなかった 2 件に対応:

(1) CodeRabbit Major (adapter/sqs_throttle.go:221 chargeBucket fail-open after retry exhaustion):

ご指摘どおり、chargeBucket retry loop の terminal fallback が chargeOutcome{allowed: true} を返す fail-open になっており、invalidate/reconcile race を rate limit を enforce すべき path で bypass する経路がありました。

修正: terminal fallback を {allowed: false, retryAfter: 1s, bucketPresent: false} に変更。Pathological exhaustion 時に client は normal back-off cue を受け取り、次の attempt は (sweep / reconciliation が同じ fresh bucket を時間軸を跨いで evict できないので) ほぼ確実に live bucket に着弾します。

(2) CodeRabbit Major (split-queue-fifo §7.2 PartitionCount implicit default):

642ec0d4 で §7.2 doc は "PartitionCount required for HT-FIFO" に更新されましたが、validator が wire で enforce していなかった。validatePartitionConfig に rule 追加:

  • meta.IsFIFO && meta.FifoThroughputLimit == htfifoThroughputPerMessageGroupID && meta.PartitionCount <= 1InvalidAttributeValue reject
  • 新規 rule の追加で validatePartitionConfig が cyclop=10 を越えるため、validatePartitionShape (PartitionCount power-of-two + cap) と validateStandardQueueRejectsHTFIFO (FIFO-only attr rules) に refactor

新規 unit test: TestValidatePartitionConfig_PerMessageGroupIDRequiresExplicitPartitionCount — (PartitionCount ∈ {0, 1, 8}) × (perMessageGroupId / perQueue / 未指定) の accept/reject matrix を網羅。

既存 integration test (Immutability / AllOrNothing / GetQueueAttributesRoundTrip) を perQueue に変更 — perMessageGroupId は新 rule + dormancy gate (§11 PR 2) の合わせ技で wire 上 reachable でなくなったため。perQueue は legitimate な single-partition FIFO 設定で、同じ immutability / roundtrip path を exercise します。

go test -race ./adapter/... clean、golangci-lint ./... clean。

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @bootjp's task in 8m 2s —— View job


Review

  • Read adapter/sqs_throttle.go — fail-closed fix for retry exhaustion
  • Read adapter/sqs_partitioning.go — PartitionCount validation refactor
  • Read adapter/sqs_partitioning_test.go — new PerMessageGroupID test
  • Read adapter/sqs_partitioning_integration_test.go — updated to perQueue
  • Read adapter/sqs_throttle_test.go — check for related test changes
  • Check design docs for consistency with implementation
  • Check for new issues
  • Overall assessment

Both trigger fixes from round 12 are correctly applied. Three still-open CodeRabbit findings from the latest round (posted concurrently with 55297f6) plus one new low-severity doc finding.


Both trigger fixes: resolved ✓

CodeRabbit Major — chargeBucket fail-closed on retry exhaustion
sqs_throttle.go:229-233: the terminal fallback now returns chargeOutcome{allowed: false, retryAfter: time.Second, bucketPresent: false}. The comment correctly explains the pathological scenario (sweep/reconciliation cannot repeatedly evict the same fresh bucket without time advancing past evictedAfter) and the rationale for returning a throttling cue over a 500 (next attempt will almost certainly land on the now-stable bucket). The existing TestBucketStore_OrphanedBucketRetriesToLiveEntry tests the single-retry happy path; the 4-retry exhaustion path is genuinely pathological and adding a deterministic test would require injecting 4 consecutive evictions of freshly-inserted buckets, which is out of scope for this PR.

CodeRabbit Major — validatePartitionConfig requires explicit PartitionCount for perMessageGroupId
sqs_partitioning.go:166-169: new rule correctly checks IsFIFO && FifoThroughputLimit == perMessageGroupId && PartitionCount <= 1 and rejects with InvalidAttributeValue. The two-way interaction with the dormancy gate is sound:

  • perMessageGroupId + no PartitionCount → schema validator rejects (new rule) ✓
  • perMessageGroupId + PartitionCount = 8 → schema validator passes, dormancy gate rejects (existing rule) ✓

The refactor into validatePartitionShape + validateStandardQueueRejectsHTFIFO is clean — both helpers have a clear single responsibility, the cyclop ceiling is respected, and the original behaviour is preserved for all existing test cases.

Integration tests updated to perQueue
mustCreateFIFOWithThroughputLimit now creates with htfifoThroughputPerQueue rather than htfifoThroughputPerMessageGroupID. This is correct: perMessageGroupId is now rejected at the wire today (no PartitionCount → schema reject, with PartitionCount → dormancy gate reject). perQueue is a legitimate single-partition FIFO setting that exercises the same immutability/round-trip paths without hitting either gate. The ImmutabilitySetQueueAttributesRejects test correctly now tries to flip to perMessageGroupId (which also rejects, via both immutability and the new schema rule), and the same-value no-op still passes. ✓

New unit test
TestValidatePartitionConfig_PerMessageGroupIDRequiresExplicitPartitionCount covers the accept/reject matrix across PartitionCount ∈ {0, 1, 8} and FifoThroughputLimit ∈ {perMessageGroupId, perQueue, unset}. The assertion for PartitionCount=8 + perMessageGroupId correctly checks that the schema validator alone does not reject it (the dormancy gate is a separate call, tested separately). ✓


Still-open CodeRabbit findings from the latest round (posted concurrently with 55297f6)

[Medium] adapter/sqs_catalog.go:493//nolint:gosec for PartitionCount parsing

m.PartitionCount = uint32(n) //nolint:gosec // bounded by ParseUint(_, _, 32) above.

CLAUDE.md explicitly requires: "Avoid //nolint — refactor instead." The fix is a one-line change: parse with strconv.ParseUint(_, _, 64), add an explicit if n > math.MaxUint32 bounds check with the same InvalidAttributeValue error, then cast. math is already imported in this file. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md §3.1 line 115 — doc says buckets.Delete(key), code uses invalidateQueue

Line 115 says SetQueueAttributes "calls buckets.Delete(key) for every bucketKey belonging to the updated queue". But sqs_catalog.go:1457 correctly calls s.throttle.invalidateQueue(name) — the lock-then-delete-then-flag protocol that avoids the orphan-bucket race. A raw buckets.Delete(key) would reintroduce that race: a charger holding the old pointer before deletion could spend against the orphaned bucket, then the next request mints a fresh full-capacity bucket — a transient double-allotment window. The code is correct; the doc needs to say invalidateQueue-style semantics (lock + delete + flag) rather than per-key Delete. Fix this →

[Medium] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §4.2X-Elastickv-Receive-Partition not documented as internal-only

Lines 231-232 describe the X-Elastickv-Receive-Partition header without stating it is an internal hop-hint only. A malicious client that sends this header could force single-partition reads and observe false-empty responses or targeted partition drains — exactly the forwarding-semantics gap the fanout was designed to close. The doc should explicitly state: (a) ingress strips or ignores any client-supplied X-Elastickv-Receive-Partition header; (b) only the trusted proxyToLeader path may set it. Without this clause the implementation PR author might not add the ingress-strip, leaving clients with a vector to bypass the fan-out logic. Fix this →


New finding

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §10 — Open Questions items 3 and 4 are stale

§7.2 explicitly resolves the "PartitionCount default" question: "PartitionCount is required whenever multi-partition FIFO is requested … There is no server-side default." The new validator in sqs_partitioning.go now enforces this at the wire. Yet §10 item 3 still asks "what default?" and item 4 proposes a --sqsDefaultFifoPartitionCount flag — directly contradicting §7.2. An implementor reading §10 after §7.2 would be confused about whether the question is settled. Both items should be removed or replaced with a note that the question is resolved (reject with InvalidAttributeValue). Fix this →


Summary

Both round-12 trigger fixes are cleanly applied. The three CodeRabbit findings that were posted concurrently with 55297f6//nolint:gosec in sqs_catalog.go, doc vs code divergence on SetQueueAttributes invalidation path, and X-Elastickv-Receive-Partition internal-only header — are the remaining items before implementation PRs for Phase 3.D data-plane work start. The Open Questions stale items are low-severity cleanup.
| Branch

…review)

Four findings from the round-12 Claude review on commit 55297f6:

(1) Medium (sqs_catalog.go:493) — //nolint:gosec on the PartitionCount uint64→uint32 narrowing. CLAUDE.md requires refactor over suppression. Reparsed at uint64 width with an explicit math.MaxUint32 bounds check before the narrowing; gosec is happy without the comment.

(2) Medium (per-queue-throttling §3.1 line 115) — doc said SetQueueAttributes 'calls buckets.Delete(key) for every bucketKey' while the implementation calls invalidateQueue (the lock-then-CompareAndDelete-then-flag protocol that closes the orphan-bucket race). Updated the paragraph to describe invalidateQueue semantics and explain why per-key Delete would reintroduce the race that round 5 / round 6.1 closed. Also called out the no-op-write gate added in round 12.

(3) Medium (split-queue-fifo §4.2) — X-Elastickv-Receive-Partition was described as a header proxyToLeader sets but not as internal-only. Added an explicit security note: ingress strips/ignores any client-supplied X-Elastickv-Receive-Partition before fan-out runs; only the trusted proxyToLeader path may set it. Without that strip, a malicious client could force single-partition reads, observe false-empty results for groups that hash to other partitions, or repeatedly drain a targeted partition. Phase 3.D PR 5 lands the strip alongside the fan-out.

(4) Low (split-queue-fifo §10) — Open Questions items 3 ('default PartitionCount?') and 4 ('--sqsDefaultFifoPartitionCount flag?') directly contradict §7.2's 'PartitionCount required, no server-side default' resolution. Both items removed; the slot replaced with an explicit 'these were resolved' parenthetical pointing at §7.2 and the validator. The cross-partition-ordering item (was #5) is renumbered to #3.

go test -race ./adapter/... pass; golangci-lint ./... clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

Round 13 (7e434888) で round-12 Claude review の 4 件 (Medium x3 + Low) に対応:

(1) Medium (adapter/sqs_catalog.go:493 //nolint:gosec for PartitionCount):

ご指摘どおり、CLAUDE.md は //nolint を avoid して refactor することを要求しています。strconv.ParseUint(_, _, 32)ParseUint(_, _, 64) に変更し、if n > math.MaxUint32 の explicit bounds check を追加してから narrow。math import は既存。gosec G115 は静かになり //nolint 不要。

(2) Medium (docs/design/.../proposed_sqs_per_queue_throttling.md §3.1 line 115 doc divergence):

ご指摘どおり、§3.1 が SetQueueAttributes の cache invalidation を buckets.Delete(key) per-action と書いていましたが、実装は invalidateQueue (lock-then-CompareAndDelete-then-flag protocol、round 5 / 6.1 で orphan-bucket race を closed) を使っています。Doc を invalidateQueue semantics の説明に rewrite し、raw per-key Delete だと closed race を reintroduce してしまうことを併記。Round 12 の no-op gate (same-value SetQueueAttributes が bucket reset しない) も同 paragraph に統合。

(3) Medium (docs/design/.../proposed_sqs_split_queue_fifo.md §4.2 X-Elastickv-Receive-Partition internal-only):

ご指摘どおり、X-Elastickv-Receive-PartitionproxyToLeader が設定する hop hint として説明していましたが、internal-only であることを明示していなかったため、malicious client が直接 public endpoint に同 header を送ると single-partition read を強制 / targeted partition の repeated drain などができる経路がありました。明示的な security note を追加: ingress handler が client-supplied 値を strip / unconditionally ignore することを doc 化、Phase 3.D PR 5 の handler change 内で strip を land する旨と integration test 計画への追加を §9 に明記。

(4) Low (docs/design/.../proposed_sqs_split_queue_fifo.md §10 Open Questions stale items 3+4):

ご指摘どおり、§10 item 3 ("default PartitionCount?") と item 4 ("--sqsDefaultFifoPartitionCount flag?") が §7.2 の "PartitionCount required, no server-side default" 決定と直接矛盾していました。両 item を削除、slot に "これらは §7.2 で resolved" 旨の parenthetical を追加して理由 (CreateQueue idempotency が hidden deployment state に依存する hazard) を pointer 化。元 item 5 (cross-partition ordering) を item 3 に renumber。

go test -race ./adapter/... clean、golangci-lint ./... clean。

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @bootjp's task in 3m 0s —— View job


Review

  • Read adapter/sqs_catalog.go — PartitionCount //nolint:gosec fix
  • Read docs/design/2026_04_26_proposed_sqs_per_queue_throttling.md — §3.1 invalidateQueue doc fix
  • Read docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md — §4.2 internal-only header + §10 stale items
  • Verify all 4 trigger fixes
  • Check for new issues
  • Overall assessment

All four fixes from this push are correctly applied. Two new low-severity findings.


All four trigger fixes: resolved ✓

adapter/sqs_catalog.go — PartitionCount //nolint:gosec removed (medium)
strconv.ParseUint(_, _, 64) + explicit if n > math.MaxUint32 bounds check before the uint32(n) cast. math was already imported. gosec G115 is satisfied without any suppression annotation. The guard returns InvalidAttributeValue("PartitionCount is out of range") for values > 4,294,967,295 — correctly consistent with the existing error envelope for other out-of-range attributes.

docs/design/.../proposed_sqs_per_queue_throttling.md §3.1 — invalidateQueue vs per-key Delete (medium)
The rewritten paragraph correctly describes invalidateQueue's lock-then-CompareAndDelete-then-evicted=true ordering and explains why a raw per-key buckets.Delete would reintroduce the orphan-bucket race: a charger holding the old pointer pre-Delete acquires the bucket's mu after the map entry is gone, sees evicted=false, spends a token, then the next request mints a fresh full-capacity bucket — a transient double-allotment window. The no-op gate (same-value SetQueueAttributes does not reset) from round 12 is folded into the same paragraph. Cross-reference to DeleteQueue/CreateQueue paragraph below is accurate.

docs/design/.../proposed_sqs_split_queue_fifo.md §4.2 — X-Elastickv-Receive-Partition internal-only (medium)
The security note is accurate and appropriately alarming. The failure modes are spelled out concretely: force single-partition reads, observe false-empty responses for other groups, drain a targeted partition. The attribution to Phase 3.D PR 5 for the ingress-strip implementation and the §9 test-plan pointer are present. One issue with the §9 pointer — see below.

docs/design/.../proposed_sqs_split_queue_fifo.md §10 stale items 3 + 4 removed (low)
Items 3 ("Default PartitionCount for AWS-shape callers") and 4 ("--sqsDefaultFifoPartitionCount flag") removed and replaced with an explicit "resolved" parenthetical citing §7.2 and validatePartitionConfig. Item 5 (cross-partition ordering for visibility) is correctly renumbered to 3. The resolved note accurately points at the mechanism (validatePartitionConfig in adapter/sqs_partitioning.go).


New findings

[Low] docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md §4.2 — §9 cross-reference is a dangling pointer

The new security note (line 240) ends with:

"the integration-test plan in §9 covers a 'client-set internal header is ignored' case"

But §9's test list (items 1–5, plus Jepsen and metrics) does not include this test. The items cover:

  1. Unit tests for partitionFor distribution/determinism/edge cases
  2. End-to-end delivery
  3. Receipt-handle round-trip
  4. SetQueueAttributes immutability enforcement
  5. WaitTimeSeconds shared-deadline bound

None cover the header-strip security property. An implementation PR author reading §9 for guidance on what to test would miss this requirement entirely; the cross-reference gives false confidence. Fix: add a bullet to §9 item 2 (end-to-end) or a new item for the header-strip case — e.g. "Send ReceiveMessage with X-Elastickv-Receive-Partition: 0 from a direct client (not via proxyToLeader); assert the full N-partition fanout runs regardless of the header value, and the response is identical to a call without the header." Fix this →

[Low] adapter/sqs_catalog.go:488–491 — comment references PR number and violates CLAUDE.md

The comment added for the bounds check reads:

// Parse at uint64 width and bound-check explicitly so the
// uint32 narrowing is gosec-clean without a //nolint comment
// (CLAUDE.md asks us to refactor rather than suppress; Claude
// medium on PR #664 round 12 flagged the suppression).

CLAUDE.md is explicit: "Don't reference the current task, fix, or callers ('added for the Y flow', 'handles the case from issue #123'), since those belong in the PR description and rot as the codebase evolves." The "Claude medium on PR #664 round 12" parenthetical will be meaningless to a future reader who doesn't have the PR context — and it's already in the commit message. The first two lines are the useful WHY; drop the parenthetical. Fix this →


Summary

All four trigger fixes are cleanly applied. The two new findings are both low-severity. The §9 dangling pointer (the header-strip test that §4.2 says is there but isn't) is the only item I'd want resolved before calling the doc complete — it creates false confidence that a test the implementation PR needs to write has already been specified. The PR-number comment is a style/maintainability issue with no correctness impact.
| Branch

bootjp added 2 commits April 29, 2026 01:30
…t (PR #664 round 13.1)

Two low-severity findings from the round-13 Claude review:

(1) §4.2 said the integration-test plan in §9 'covers a client-set internal header is ignored case', but §9's test list did not actually include one. Dangling pointer that would have given false confidence to the implementation PR author. Added test item 6 (TestSQSServer_HTFIFO_Internal*HeaderStrip): create a 4-partition queue with one message hashed to each partition, send a ReceiveMessage with X-Elastickv-Receive-Partition: 0 from a direct client, assert all 4 messages come back (full fanout ran). Renumbered Jepsen item 6 → 7 and metrics item 7 → 8.

(2) sqs_catalog.go PartitionCount parser comment referenced 'Claude medium on PR #664 round 12'. CLAUDE.md is explicit that comments should not reference task / PR numbers — they rot as the codebase evolves. Trimmed to the substantive WHY (uint64 parse + bounds check keeps the narrowing gosec-clean) and dropped the parenthetical.

go test -race ./adapter/... pass; golangci-lint ./... clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

Round 13.1 (8d6e2143) で round-13 Claude review の Low x2 に対応:

(1) Low (docs/design/.../proposed_sqs_split_queue_fifo.md §4.2 dangling §9 cross-reference):

ご指摘どおり、§4.2 の security note に "the integration-test plan in §9 covers a 'client-set internal header is ignored' case" と書いていましたが、§9 の test list に該当 item が無く dangling pointer 状態でした。実装 PR の author が §9 を読んで guidance とした際に false confidence になる経路。

修正: §9 に新規 item 6 (X-Elastickv-Receive-Partition ingress strip) を追加 — 4-partition queue を作成し各 partition に 1 message ずつ着弾、direct client が X-Elastickv-Receive-Partition: 0 header を付けて ReceiveMessage(MaxNumberOfMessages=4) → 4 件全て返ることを assert (full fanout 走行を確認)。Companion variant: header 無しで同じ response が返ることも assert (strip path が divergence を introduce しないこと)。Jepsen item を 6→7、metrics item を 7→8 に renumber。

(2) Low (adapter/sqs_catalog.go PartitionCount parser comment references PR number):

ご指摘どおり、CLAUDE.md は code comment で task / PR 番号を参照することを禁じています ("they rot as the codebase evolves")。先 round の comment が "(CLAUDE.md asks us to refactor rather than suppress; Claude medium on PR #664 round 12 flagged the suppression)" を含んでいたので削除。残ったのは substantive な WHY のみ ("Parse at uint64 width and bound-check explicitly so the uint32 narrowing below is gosec-clean.")。

go test -race ./adapter/... clean、golangci-lint ./... clean。

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @bootjp's task in 2m 24s —— View job


Review

  • Read Round 13.1 changes
  • Verify fix: §9 item 6 X-Elastickv-Receive-Partition ingress strip test (dangling cross-reference)
  • Verify fix: sqs_catalog.go PartitionCount parser comment — PR number removed
  • Check for new issues
  • Overall assessment

Both Round 13.1 fixes are correctly applied. One new low-severity finding.


Both trigger fixes: resolved ✓

3.D — §9 item 6 X-Elastickv-Receive-Partition ingress strip test (low)
Lines 355–359 in docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md correctly implement the new test item. The description:

  • Creates a 4-partition queue, sends one message per partition (by MessageGroupId hash)
  • Calls ReceiveMessage(MaxNumberOfMessages=4) with X-Elastickv-Receive-Partition: 0 set directly on the client request
  • Asserts all 4 messages are returned (full fanout ran, client header ignored)
  • Companion variant: same call without the header, asserts byte-identical response (strip doesn't introduce divergence)

The test spec is concrete enough for an implementation PR author to write — it names the regression scenario (a naive implementation would return at most 1 message), describes both the security path and the non-regression companion. The §4.2 cross-reference at line 240 now has a real test to point to.

adapter/sqs_catalog.go PartitionCount comment (low)
Lines 488–489 now read:

// Parse at uint64 width and bound-check explicitly so the
// uint32 narrowing below is gosec-clean.

No PR number, no task reference. The substantive WHY (why the 64-bit parse + bounds check) is preserved; the parenthetical that attributed it to a specific review round is gone. Clean per CLAUDE.md.


New finding

[Low] adapter/sqs_throttle.go, adapter/sqs_catalog.go, adapter/sqs_partitioning.go — pervasive PR-number references in implementation code comments

Round 13.1 fixed the one instance I flagged (sqs_catalog.go line 489). However, the same CLAUDE.md pattern — "Don't reference the current task, fix, or callers ('handles the case from issue #123')" — applies across all three implementation files. Examples:

  • sqs_throttle.go:35: "(Codex P1 on PR #679)"
  • sqs_throttle.go:97: "(Codex P2 on PR #664 round 9)"
  • sqs_throttle.go:225: "(CodeRabbit Major on PR #664 round 11)"
  • sqs_catalog.go:89: "(Codex P2 on PR #664 round 9)"
  • sqs_partitioning.go:90: "(CodeRabbit nit on PR #664 round 7)"

There are ~20+ such instances across the three files. Each correctly describes the WHY in the first part of the comment, but the attribution parenthetical will be meaningless to a reader who doesn't have PR #664's context and will gradually become misleading as the code evolves. CLAUDE.md is clear: these belong in the PR description, not the code.

The design docs (.md files) are archival records — PR/review references there are a documentation practice rather than a code comment convention and are acceptable. The Go implementation files are the CLAUDE.md-governed surface.

This is a broad style cleanup, not a correctness issue, and does not block any implementation PRs. I'd suggest batching the removals into a single follow-up commit rather than addressing one at a time. Fix this →


Summary

Both Round 13.1 fixes are cleanly applied. The one new finding is a low-severity style cleanup — pervasive PR-number attribution comments in implementation files that violate CLAUDE.md's "don't reference issue numbers in code" guideline. It doesn't block any implementation PRs and is best addressed in a single follow-up cleanup commit. The proposal and implementation are otherwise in good shape.

| Branch | |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d6e2143ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_throttle.go
Comment on lines +541 to +544
const capSecs = float64(throttleRetryAfterCap / time.Second)
if secs > capSecs {
secs = capSecs
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return uncapped Retry-After for slow refill buckets

computeRetryAfter hard-caps waits at 1 hour, which underestimates recovery time for valid low-refill configs and larger token costs. For example, with capacity=10, refill=0.001, and a 10-entry batch charge, the true wait is 10,000s but the response says 3,600s, so compliant clients will retry too early and get throttled again repeatedly. This also defeats the comment’s eviction rationale because each retry updates lastRefill, so the bucket does not idle-evict. Please compute Retry-After from the actual refill math (or constrain config so the cap is never below the true wait).

Useful? React with 👍 / 👎.

…md (PR #664 round 14)

Claude low on round 13.1: ~25+ comments across the SQS Go files reference review attributions ('Codex P1 on PR #679', 'CodeRabbit Major on PR #664 round 11', etc.). CLAUDE.md is explicit: 'Don't reference the current task, fix, or callers ... since those belong in the PR description and rot as the codebase evolves.'

Sweep across:
- adapter/sqs_throttle.go
- adapter/sqs_catalog.go
- adapter/sqs_partitioning.go
- adapter/sqs_messages.go
- adapter/sqs_throttle_test.go
- adapter/sqs_partitioning_test.go
- adapter/sqs_throttle_integration_test.go
- adapter/sqs_partitioning_integration_test.go
- adapter/sqs_catalog_test.go
- adapter/sqs_query_protocol.go
- adapter/sqs_query_protocol_test.go

Each comment now retains the substantive WHY (why the design choice was made) and drops the parenthetical attribution to a specific reviewer / PR number / round. Design docs (.md files) are exempt — those are archival records where review attribution is acceptable practice.

go test -race ./adapter/... pass; golangci-lint ./... clean. No behaviour change.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

Round 14 (213ded44) で round-13.1 Claude review の Low (pervasive PR-number references in implementation code comments) に対応:

ご指摘どおり、CLAUDE.md は code comment で task / PR 番号 / review attribution を参照することを明示的に禁じており ("they rot as the codebase evolves")、SQS Go files に ~25+ 個の "Codex P1 on PR #679", "CodeRabbit Major on PR #664 round 11" 等の comment が点在していました。

修正: 11 ファイル横断で sweep:

  • adapter/sqs_throttle.go
  • adapter/sqs_catalog.go
  • adapter/sqs_partitioning.go
  • adapter/sqs_messages.go
  • adapter/sqs_throttle_test.go
  • adapter/sqs_partitioning_test.go
  • adapter/sqs_throttle_integration_test.go
  • adapter/sqs_partitioning_integration_test.go
  • adapter/sqs_catalog_test.go
  • adapter/sqs_query_protocol.go
  • adapter/sqs_query_protocol_test.go

各 comment は substantive な WHY (なぜその設計判断をしたか) を保持し、reviewer/PR number/round への parenthetical attribution のみ削除。Design doc (.md) は archival record として attribution が acceptable practice なので exempt (Claude review でも同じ判断)。

grep -nE "round [0-9]+|PR #[0-9]+|Round [0-9]+|round-[0-9]+" adapter/sqs_*.go で確認 — 残っている参照はゼロ。

go test -race ./adapter/... clean、golangci-lint ./... clean。Behaviour 変更なし。

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @bootjp's task in 4m 19s —— View job


Review

  • Read Round 14 changes (PR-number reference sweep)
  • Verify substantive WHY content preserved in all modified files
  • Confirm no PR/round number references remain in Go files
  • Check design docs were correctly exempted
  • Check for new issues
  • Overall assessment

All eleven files swept in Round 14 are correctly updated. No PR numbers or round references remain in any Go file. One small residue finding noted below.


Round 14 trigger fix: resolved ✓

PR-number and round attribution sweep (low)
All (Codex P1 on PR #679), (CodeRabbit Major on PR #664 round 11), (Gemini high-priority on PR #662), (Claude review on PR #681 round 2), etc. parentheticals are removed across:

  • adapter/sqs_throttle.go — 10 attributions removed
  • adapter/sqs_catalog.go — 7 attributions removed
  • adapter/sqs_partitioning.go — 5 attributions removed
  • adapter/sqs_messages.go — 2 attributions removed
  • adapter/sqs_throttle_test.go — 13 attributions removed
  • adapter/sqs_partitioning_test.go — 5 attributions removed
  • adapter/sqs_throttle_integration_test.go — 1 attribution removed
  • adapter/sqs_partitioning_integration_test.go — 1 attribution removed
  • adapter/sqs_catalog_test.go — 1 attribution removed
  • adapter/sqs_query_protocol.go — 2 attributions removed (lines 151 and 511)
  • adapter/sqs_query_protocol_test.go — 1 attribution removed (line 151)

The substantive WHY is preserved in every case. The technical reasoning travels in the surrounding sentences, not in the attribution parenthetical. Representative example from sqs_throttle.go:

-// state via a no-op SetQueueAttributes (Codex P1 on PR #679).
+// state via a no-op SetQueueAttributes.

The preceding sentence explains the exact failure mode; the attribution was redundant.

Design docs correctly exempted
No docs/design/*.md files appear in the Round 14 diff. The two proposal docs retain their PR/round references as archival records — consistent with what was agreed in round 13.1.


New finding

[Low] adapter/sqs_query_protocol.go and adapter/sqs_query_protocol_test.go — four reviewer-tool-name references missed by the sweep

Round 14 removed the PR-number-bearing instances from sqs_query_protocol.go (line 151: Gemini high-priority on PR #662; line 511: Gemini medium on PR #662) and sqs_query_protocol_test.go (line 151: Gemini medium on PR #662), but left four instances where the reviewer names appear without a PR number:

File Line Text
sqs_query_protocol.go 232 SQS query reference is explicit on this and CodexP1 / Gemini
sqs_query_protocol.go 284 make tests stable. (CodexP2 + Gemini high.)
sqs_query_protocol_test.go 90 SQS query reference. CodexP1 + Gemini both flagged the prior
sqs_query_protocol_test.go 117 (lower index wins). CodexP2 flagged the previous map iteration as

These don't have PR numbers, but they still attribute to external reviewers — which "rot as the codebase evolves" for the same reasons PR numbers do. They satisfy the spirit of the CLAUDE.md prohibition even without the PR number. The technical reasoning in each case stands without the attribution. Fix this →

[Cosmetic] adapter/sqs_throttle.go:224-225 — sentence split at "a" is awkward post-sweep

Before Round 14:

// path that is supposed to enforce limits (CodeRabbit Major on
// PR #664 round 11). Returning a Throttling response with a
// non-zero retryAfter gives the client a normal back-off cue

After Round 14:

// path that is supposed to enforce limits. Returning a Throttling
// response with a
// non-zero retryAfter gives the client a normal back-off cue

The sentence is grammatically correct but the line break at "response with a" is awkward — "a" at end-of-line dangling before "non-zero" on the next. Functionally harmless, worth reflowing in the same follow-up pass. Fix this →


Summary

Round 14 is clean — all PR-number and PR/round attribution references are removed from the Go implementation files. The four remaining reviewer-tool-name references in sqs_query_protocol.go and sqs_query_protocol_test.go are the only missed items; none block merging. Recommend a minimal follow-up pass to clear those four lines (a one-commit change), then this PR is ready.
| Branch

@bootjp bootjp merged commit bc6dbd4 into main Apr 28, 2026
9 checks passed
@bootjp bootjp deleted the docs/sqs-phase3-proposals branch April 28, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants